Much of the spectacular progress in biomedical science over the last half-century is the direct consequence of the work of thousands of basic scientists whose primary goal was understanding of the fundamental working of living things. Despite this, many politicians, funders, and even scientists have come to believe that the pace of successful applications to medical diagnosis and therapy is limited by our willingness to focus directly on human health, rather than a continuing deficit of understanding. By this theory, curiosity-driven research, aimed at understanding, is no longer important or even useful. What is advocated instead is “translational” research aimed directly at treating disease. I believe this idea to be deeply mistaken. Recent history suggests instead that what we have learned in the last 50 years is only the beginning. The way forward is to invest more in basic science, not less.
A general method for the dynamic control of single gene expression in eukaryotes, with no off-target effects, is a long-sought tool for molecular and systems biologists. We engineered two artificial transcription factors (ATFs) that contain Cys2His2 zinc-finger DNA-binding domains of either the mouse transcription factor Zif268 (9 bp of specificity) or a rationally designed array of four zinc fingers (12 bp of specificity). These domains were expressed as fusions to the human estrogen receptor and VP16 activation domain. The ATFs can rapidly induce a single gene driven by a synthetic promoter in response to introduction of an otherwise inert hormone with no detectable off-target effects. In the absence of inducer, the synthetic promoter is inactive and the regulated gene product is not detected. Following addition of inducer, transcripts are induced >50-fold within 15 min. We present a quantitative characterization of these ATFs and provide constructs for making their implementation straightforward. These new tools allow for the elucidation of regulatory network elements dynamically, which we demonstrate with a major metabolic regulator, Gcn4p.
Here we establish the utility of a recently described perturbative method to study complex regulatory circuits in vivo. By combining rapid modulation of single TFs under physiological conditions with genome-wide expression analysis, we elucidate several novel regulatory features within the pathways of sulfur assimilation and beyond.
In yeast, the pathways of sulfur assimilation are combinatorially controlled by five transcriptional regulators (three DNA-binding proteins [Met31p, Met32p, and Cbf1p], an activator [Met4p], and a cofactor [Met28p]) and a ubiquitin ligase subunit (Met30p). This regulatory system exerts combinatorial control not only over sulfur assimilation and methionine biosynthesis, but also on many other physiological functions in the cell. Recently we characterized a gene induction system that, upon the addition of an inducer, results in near-immediate transcription of a gene of interest under physiological conditions. We used this to perturb levels of single transcription factors during steady-state growth in chemostats, which facilitated distinction of direct from indirect effects of individual factors dynamically through quantification of the subsequent changes in genome-wide patterns of gene expression. We were able to show directly that Cbf1p acts sometimes as a repressor and sometimes as an activator. We also found circumstances in which Met31p/Met32p function as repressors, as well as those in which they function as activators. We elucidated and numerically modeled feedback relationships among the regulators, notably feedforward regulation of Met32p (but not Met31p) by Met4p that generates dynamic differences in abundance that can account for the differences in function of these two proteins despite their identical binding sites.
The sulfur assimilation pathway is used to understand how combinatorial transcription coordinates cellular processes. Global gene expression was measured in yeast lacking different combinations of transcription factors in order to determine how these factors coordinate sulfur assimilation with diverse metabolic and physiological processes.
Methionine abundance affects diverse cellular functions, including cell division, redox homeostasis, survival under starvation, and oxidative stress response. Regulation of the methionine biosynthetic pathway involves three DNA-binding proteins—Met31p, Met32p, and Cbf1p. We hypothesized that there exists a “division of labor” among these proteins that facilitates coordination of methionine biosynthesis with diverse biological processes. To explore combinatorial control in this regulatory circuit, we deleted CBF1, MET31, and MET32 individually and in combination in a strain lacking methionine synthase. We followed genome-wide gene expression as these strains were starved for methionine. Using a combination of bioinformatic methods, we found that these regulators control genes involved in biological processes downstream of sulfur assimilation; many of these processes had not previously been documented as methionine dependent. We also found that the different factors have overlapping but distinct functions. In particular, Met31p and Met32p are important in regulating methionine metabolism, whereas p functions as a “generalist” transcription factor that is not specific to methionine metabolism. In addition, Met31p and Met32p appear to regulate iron–sulfur cluster biogenesis through direct and indirect mechanisms and have distinguishable target specificities. Finally, CBF1 deletion sometimes has the opposite effect on gene expression from MET31 and MET32 deletion.
Cytoprotective functions of a 20S proteasome activator were investigated. Saccharomyces cerevisiae
Blm10 and human 20S proteasome activator 200 (PA200) are homologs. Comparative genome-wide analyses of untreated diploid cells lacking Blm10 and growing at steady state at defined growth rates revealed downregulation of numerous genes required for accurate chromosome structure, assembly and repair, and upregulation of a specific subset of genes encoding protein-folding chaperones. Blm10 loss or truncation of the Ubp3/Blm3 deubiquitinating enzyme caused massive chromosomal damage and cell death in homozygous diploids after phleomycin treatments, indicating that Blm10 and Ubp3/Blm3 function to stabilize the genome and protect against cell death. Diploids lacking Blm10 also were sensitized to doxorubicin, hydroxyurea, 5-fluorouracil, rapamycin, hydrogen peroxide, methyl methanesulfonate, and calcofluor. Fluorescently tagged Blm10 localized in nuclei, with enhanced fluorescence after DNA replication. After DNA damage that caused a classic G2/M arrest, fluorescence remained diffuse, with evidence of nuclear fragmentation in some cells. Protective functions of Blm10 did not require the carboxyl-terminal region that makes close contact with 20S proteasomes, indicating that protection does not require this contact or the truncated Blm10 can interact with the proteasome apart from this region. Without its carboxyl-terminus, Blm10(−339aa) localized to nuclei in untreated, nonproliferating (G0) cells, but not during G1 S, G2, and M. The results indicate Blm10 functions in protective mechanisms that include the machinery that assures proper assembly of chromosomes. These essential guardian functions have implications for ubiquitin-independent targeting in anticancer therapy. Targeting Blm10/PA200 together with one or more of the upregulated chaperones or a conventional treatment could be efficacious.
20S proteasome activator; BLM10/PA200; UBP3/BLM3; DNA damage; molecular chaperones
Transitions between the two phases of the cell growth cycle can account for the environmental stress response, the growth-rate response, and the cross-protection between slow growth and various types of stress factors. It is suggested that this mechanism is conserved across budding and fission yeast and normal human cells.
The respiratory metabolic cycle in budding yeast (Saccharomyces cerevisiae) consists of two phases that are most simply defined phenomenologically: low oxygen consumption (LOC) and high oxygen consumption (HOC). Each phase is associated with the periodic expression of thousands of genes, producing oscillating patterns of gene expression found in synchronized cultures and in single cells of slowly growing unsynchronized cultures. Systematic variation in the durations of the HOC and LOC phases can account quantitatively for well-studied transcriptional responses to growth rate differences. Here we show that a similar mechanism—transitions from the HOC phase to the LOC phase—can account for much of the common environmental stress response (ESR) and for the cross-protection by a preliminary heat stress (or slow growth rate) to subsequent lethal heat stress. Similar to the budding yeast metabolic cycle, we suggest that a metabolic cycle, coupled in a similar way to the ESR, in the distantly related fission yeast, Schizosaccharomyces pombe, and in humans can explain gene expression and respiratory patterns observed in these eukaryotes. Although metabolic cycling is associated with the G0/G1 phase of the cell division cycle of slowly growing budding yeast, transcriptional cycling was detected in the G2 phase of the division cycle in fission yeast, consistent with the idea that respiratory metabolic cycling occurs during the phases of the cell division cycle associated with mass accumulation in these divergent eukaryotes.
We developed systems to rapidly express any yeast gene or to specifically degrade any protein, each with minimal untargeted disturbance of cell physiology. We illustrate applications of these new tools for elucidating the architecture and dynamics of genetic regulatory networks.
We describe the development and characterization of a system that allows the rapid and specific induction of individual genes in the yeast Saccharomyces cerevisiae without changes in nutrients or temperature. The system is based on the chimeric transcriptional activator Gal4dbd.ER.VP16 (GEV). Upon addition of the hormone β-estradiol, cytoplasmic GEV localizes to the nucleus and binds to promoters containing Gal4p consensus binding sequences to activate transcription. With galactokinase Gal1p and transcriptional activator Gal4p absent, the system is fast-acting, resulting in readily detectable transcription within 5 min after addition of the inducer. β-Estradiol is nearly a gratuitous inducer, as indicated by genome-wide profiling that shows unintended induction (by GEV) of only a few dozen genes. Response to inducer is graded: intermediate concentrations of inducer result in production of intermediate levels of product protein in all cells. We present data illustrating several applications of this system, including a modification of the regulated degron method, which allows rapid and specific degradation of a specific protein upon addition of β-estradiol. These gene induction and protein degradation systems provide important tools for studying the dynamics and functional relationships of genes and their respective regulatory networks.
The sulfur assimilation and phospholipid biosynthesis pathways interact metabolically and transcriptionally. Genetic analysis, genome-wide sequencing, and expression microarrays show that regulators of these pathways, Met4p and Opi1p, control cellular methylation capacity that can limit the growth rate.
A yeast strain lacking Met4p, the primary transcriptional regulator of the sulfur assimilation pathway, cannot synthesize methionine. This apparently simple auxotroph did not grow well in rich media containing excess methionine, forming small colonies on yeast extract/peptone/dextrose plates. Faster-growing large colonies were abundant when overnight cultures were plated, suggesting that spontaneous suppressors of the growth defect arise with high frequency. To identify the suppressor mutations, we used genome-wide single-nucleotide polymorphism and standard genetic analyses. The most common suppressors were loss-of-function mutations in OPI1, encoding a transcriptional repressor of phospholipid metabolism. Using a new system that allows rapid and specific degradation of Met4p, we could study the dynamic expression of all genes following loss of Met4p. Experiments using this system with and without Opi1p showed that Met4 activates and Opi1p represses genes that maintain levels of S-adenosylmethionine (SAM), the substrate for most methyltransferase reactions. Cells lacking Met4p grow normally when either SAM is added to the media or one of the SAM synthetase genes is overexpressed. SAM is used as a methyl donor in three Opi1p-regulated reactions to create the abundant membrane phospholipid, phosphatidylcholine. Our results show that rapidly growing cells require significant methylation, likely for the biosynthesis of phospholipids.
Metabolic gene clusters—functionally related and physically clustered genes—are a common feature of some eukaryotic genomes. Two hypotheses have been advanced to explain the origin and maintenance of metabolic gene clusters: coordinated gene expression and genetic linkage. Here we test the hypothesis that selection for coordinated gene expression underlies the clustering of GAL genes in the yeast genome. We find that, although clustering coordinates the expression of GAL1 and GAL10, disrupting the GAL cluster does not impair fitness, suggesting that other mechanisms, such as genetic linkage, drive the origin and maintenance metabolic gene clusters.
We discovered that the relative durations of the phases of the yeast metabolic cycle change with the growth rate. These changes can explain mechanistically the transcriptional growth-rate responses of all yeast genes (25% of the genome) that we find to be the same across all studied nutrient limitations in either ethanol or glucose media.
We studied the steady-state responses to changes in growth rate of yeast when ethanol is the sole source of carbon and energy. Analysis of these data, together with data from studies where glucose was the carbon source, allowed us to distinguish a “universal” growth rate response (GRR) common to all media studied from a GRR specific to the carbon source. Genes with positive universal GRR include ribosomal, translation, and mitochondrial genes, and those with negative GRR include autophagy, vacuolar, and stress response genes. The carbon source–specific GRR genes control mitochondrial function, peroxisomes, and synthesis of vitamins and cofactors, suggesting this response may reflect the intensity of oxidative metabolism. All genes with universal GRR, which comprise 25% of the genome, are expressed periodically in the yeast metabolic cycle (YMC). We propose that the universal GRR may be accounted for by changes in the relative durations of the YMC phases. This idea is supported by oxygen consumption data from metabolically synchronized cultures with doubling times ranging from 5 to 14 h. We found that the high oxygen consumption phase of the YMC can coincide exactly with the S phase of the cell division cycle, suggesting that oxidative metabolism and DNA replication are not incompatible.
The fate of a newly arising beneficial mutation depends on many factors, such as the population size and the availability and fitness effects of other mutations that accumulate in the population. It has proved difficult to understand how these factors influence the trajectories of particular mutations, since experiments have primarily focused on characterizing successful clones emerging from a small number of evolving populations. Here, we present the results of a massively parallel experiment designed to measure the full spectrum of possible fates of new beneficial mutations in hundreds of experimental yeast populations, whether these mutations are ultimately successful or not. Using strains in which a particular class of beneficial mutation is detectable by fluorescence, we followed the trajectories of these beneficial mutations across 592 independent populations for 1000 generations. We find that the fitness advantage provided by individual mutations plays a surprisingly small role. Rather, underlying “background” genetic variation is quickly generated in our initially clonal populations and plays a crucial role in determining the fate of each individual beneficial mutation in the evolving population.
Comparative analysis of predicted protein sequences encoded by the genomes of Caenorhabditis elegans and Saccharomyces cerevisiae suggests that most of the core biological functions are carried out by orthologous proteins (proteins of different species that can be traced back to a common ancestor) that occur in comparable numbers. The specialized processes of signal transduction and regulatory control that are unique to the multicellular worm appear to use novel proteins, many of which re-use conserved domains. Major expansion of the number of some of these domains seen in the worm may have contributed to the advent of multicellularity. The proteins conserved in yeast and worm are likely to have orthologs throughout eukaryotes; in contrast, the proteins unique to the worm may well define metazoans.
Genetic and physical maps for the 16 chromosomes of Saccharomyces cerevisiae are presented. The genetic map is the result of 40 years of genetic analysis. The physical map was produced from the results of an international systematic sequencing effort. The data for the maps are accessible electronically from the Saccharomyces Genome Database (SGD: http://genome-www.stanford.edu/Saccharomyces/).
The S. cerevisiae genome is the most well-characterized eukaryotic genome and one of the simplest in terms of identifying open reading frames (ORFs), yet its primary annotation has been updated continually in the decade since its initial release in 1996 (Goffeau et al., 1996). The Saccharomyces Genome Database (SGD; www.yeastgenome.org) (Hirschman et al., 2006), the community-designated repository for this reference genome, strives to ensure that the S. cerevisiae annotation is as accurate and useful as possible. At SGD, the S. cerevisiae genome sequence and annotation are treated as a working hypothesis, which must be repeatedly tested and refined. In this paper, in celebration of the tenth anniversary of the completion of the S. cerevisiae genome sequence, we discuss the ways in which the S. cerevisiae sequence and annotation have changed, consider the multiple sources of experimental and comparative data on which these changes are based, and describe our methods for evaluating, incorporating and documenting these new data.
S. cerevisiae; genome sequence; genome annotation; comparative genomics; exon/intron boundaries
GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology (GO) information and evaluating and visualizing the collective annotation of a list of genes to GO terms. It can be used to draw conclusions from microarray and other biological data, calculating the statistical significance of each annotation. GO::TermFinder can be used on any system on which Perl can be run, either as a command line application, in single or batch mode, or as a web-based CGI script.
The full source code and documentation for GO::TermFinder are freely available from http://search.cpan.org/dist/GO-TermFinder/
The completion of the Saccharomyces cerevisiae genome sequencing project11 and the continued development of improved technology for large-scale genome analysis have led to tremendous growth in the amount of new yeast genetics and molecular biology data. Efficient organization, presentation, and dissemination of this information are essential if researchers are to exploit this knowledge. In addition, the development of tools that provide efficient analysis of this information and link it with pertinent information from other systems is becoming increasingly important at a time when the complete genome sequences of other organisms are becoming available. The aim of this review is to familiarize biologists with the type of data resources currently available on the World Wide Web (WWW).
World Wide Web; Saccharomyces Genome Database; Munich Information Center for Protein Sequences; Yeast Protein Database
A scientific database can be a powerful tool for biologists in an era where large-scale genomic analysis, combined with smaller-scale scientific results, provides new insights into the roles of genes and their products in the cell. However, the collection and assimilation of data is, in itself, not enough to make a database useful. The data must be incorporated into the database and presented to the user in an intuitive and biologically significant manner. Most importantly, this presentation must be driven by the user’s point of view; that is, from a biological perspective. The success of a scientific database can therefore be measured by the response of its users – statistically, by usage numbers and, in a less quantifiable way, by its relationship with the community it serves and its ability to serve as a model for similar projects. Since its inception ten years ago, the Saccharomyces Genome Database (SGD) has seen a dramatic increase in its usage, has developed and maintained a positive working relationship with the yeast research community, and has served as a template for at least one other database. The success of SGD, as measured by these criteria, is due in large part to philosophies that have guided its mission and organisation since it was established in 1993. This paper aims to detail these philosophies and how they shape the organisation and presentation of the database.
S. cerevisiae; database; genome-wide analysis; bioinformatics; yeast
Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
Recently, a novel approach has been developed to study gene expression in single cells with high time resolution using RNA Fluorescent In Situ Hybridization (FISH). The technique allows individual mRNAs to be counted with high accuracy in wild-type cells, but requires cells to be fixed; thus, each cell provides only a “snapshot” of gene expression. Here we show how and when RNA FISH data on pairs of genes can be used to reconstruct real-time dynamics from a collection of such snapshots. Using maximum-likelihood parameter estimation on synthetically generated, noisy FISH data, we show that dynamical programs of gene expression, such as cycles (e.g., the cell cycle) or switches between discrete states, can be accurately reconstructed. In the limit that mRNAs are produced in short-lived bursts, binary thresholding of the FISH data provides a robust way of reconstructing dynamics. In this regime, prior knowledge of the type of dynamics – cycle versus switch – is generally required and additional constraints, e.g., from triplet FISH measurements, may also be needed to fully constrain all parameters. As a demonstration, we apply the thresholding method to RNA FISH data obtained from single, unsynchronized cells of Saccharomyces cerevisiae. Our results support the existence of metabolic cycles and provide an estimate of global gene-expression noise. The approach to FISH data presented here can be applied in general to reconstruct dynamics from snapshots of pairs of correlated quantities including, for example, protein concentrations obtained from immunofluorescence assays.
Programs of gene expression lie at the heart of how cells regulate their internal processes. Some dynamical gene-expression programs, such as the cell cycle, are well known and studied, others, such as metabolic cycles, have only recently been recognized, and many other dynamical programs including switches are likely to be discovered. Traditional bulk studies typically fail to resolve such cycles or switches, because individual cells are out-of-phase with each other. On the other hand, standard techniques for studying single cells are limited in time resolution and scope. RNA Fluorescent In Situ Hybridization (FISH) is a single-cell technique that offers both high time-resolution and precise quantification of mRNA molecules, but requires fixed cells. We have explored how, when, and with what prior information FISH snapshots of pairs of genes can be used to accurately reconstruct gene-expression dynamics. The technique can be readily implemented, and is broadly applicable from bacteria to mammals. We lay out a principled and practical approach to extracting biological information from RNA FISH data to reveal new information about the dynamics of living organisms.
Three articles from the early years of Molecular Biology of the Cell (MBoC) have had remarkably many citations in the literature since their publication ∼10 years ago. As a coauthor of these articles and the former editor of MBoC, I was asked for possible explanations. I believe the answer lies in the unusual nature of these articles: each presents and summarizes gene expression data for nearly every gene in the yeast or human genomes. Continuing interest in the data themselves by cell biologists, rather than results or conclusions drawn by the authors, best accounts for the citation history. The flatness of the numbers of citations over time, the continuing high rate of accesses to individual Web sites set up to allow searching and display of the underlying data, and the large fraction of citations in journals focused on mathematics and computation all support the same conclusion: it's the data.
We find that the metabolome of nutrient-limited yeast varies dramatically with the limiting nutrient's identity. Low glutamine is a hallmark of nitrogen limitation, ATP of phosphorus limitation, and pyruvate of carbon limitation. The availability of these metabolites can quantitatively account for the nutrient-limited yeast's growth rate.
Microbes tailor their growth rate to nutrient availability. Here, we measured, using liquid chromatography-mass spectrometry, >100 intracellular metabolites in steady-state cultures of Saccharomyces cerevisiae growing at five different rates and in each of five different limiting nutrients. In contrast to gene transcripts, where ∼25% correlated with growth rate irrespective of the nature of the limiting nutrient, metabolite concentrations were highly sensitive to the limiting nutrient's identity. Nitrogen (ammonium) and carbon (glucose) limitation were characterized by low intracellular amino acid and high nucleotide levels, whereas phosphorus (phosphate) limitation resulted in the converse. Low adenylate energy charge was found selectively in phosphorus limitation, suggesting the energy charge may actually measure phosphorus availability. Particularly strong concentration responses occurred in metabolites closely linked to the limiting nutrient, e.g., glutamine in nitrogen limitation, ATP in phosphorus limitation, and pyruvate in carbon limitation. A simple but physically realistic model involving the availability of these metabolites was adequate to account for cellular growth rate. The complete data can be accessed at the interactive website http://growthrate.princeton.edu/metabolome.
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is a scientific database for the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker’s or budding yeast. The information in SGD includes functional annotations, mapping and sequence information, protein domains and structure, expression data, mutant phenotypes, physical and genetic interactions and the primary literature from which these data are derived. Here we describe how published phenotypes and genetic interaction data are annotated and displayed in SGD.
Yeast cells respond to a variety of environmental stresses, including heat shock and growth limitation. There is considerable overlap in these responses both from the point of view of gene expression patterns and cross-protection for survival. We performed experiments in which cells growing at different steady-state growth rates in chemostats were subjected to a short heat pulse. Gene expression patterns allowed us to partition genes whose expression responds to heat shock into subsets of genes that also respond to slow growth rate and those that do not. We found also that the degree of induction and repression of genes that respond to stress is generally weaker in respiratory deficient mutants, suggesting a role for increased respiratory activity in the apparent stress response to slow growth. Consistent with our gene expression results in wild-type cells, we found that cells growing more slowly are cross-protected for heat shock, i.e., better able to survive a lethal heat challenge. Surprisingly, however, we found no difference in cross-protection between respiratory-deficient and wild-type cells, suggesting induction of heat resistance at low growth rates is independent of respiratory activity, even though many of the changes in gene expression are not.