Search tips
Search criteria

Results 1-25 (56)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Identification of cell cycle–regulated genes periodically expressed in U2OS cells and their regulation by FOXM1 and E2F transcription factors 
Molecular Biology of the Cell  2013;24(23):3634-3650.
Characterization of the cell cycle–regulated transcripts in U2OS cells yielded 1871 unique genes. FOXM1 targets were identified via ChIP-seq, and novel targets in G2/M and S phases were verified using a real-time luciferase assay. ChIP-seq data were used to map cell cycle transcriptional regulators of cell cycle–regulated gene expression in U2OS cells.
We identify the cell cycle–regulated mRNA transcripts genome-wide in the osteosarcoma-derived U2OS cell line. This results in 2140 transcripts mapping to 1871 unique cell cycle–regulated genes that show periodic oscillations across multiple synchronous cell cycles. We identify genomic loci bound by the G2/M transcription factor FOXM1 by chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and associate these with cell cycle–regulated genes. FOXM1 is bound to cell cycle–regulated genes with peak expression in both S phase and G2/M phases. We show that ChIP-seq genomic loci are responsive to FOXM1 using a real-time luciferase assay in live cells, showing that FOXM1 strongly activates promoters of G2/M phase genes and weakly activates those induced in S phase. Analysis of ChIP-seq data from a panel of cell cycle transcription factors (E2F1, E2F4, E2F6, and GABPA) from the Encyclopedia of DNA Elements and ChIP-seq data for the DREAM complex finds that a set of core cell cycle genes regulated in both U2OS and HeLa cells are bound by multiple cell cycle transcription factors. These data identify the cell cycle–regulated genes in a second cancer-derived cell line and provide a comprehensive picture of the transcriptional regulatory systems controlling periodic gene expression in the human cell division cycle.
PMCID: PMC3842991  PMID: 24109597
2.  Gene Ontology and the annotation of pathogen genomes: the case of Candida albicans 
Trends in microbiology  2009;17(7):295-303.
The Gene Ontology (GO) is a structured controlled vocabulary developed to describe the roles and locations of gene products in a consistent fashion, in a way that can be shared across organisms. The unicellular fungus Candida albicans is similar in many ways to the model organism Saccharomyces cerevisiae, but as both a commensal and a pathogen of humans, differs greatly in its lifestyle. With an expanding at-risk population of immunosuppressed patients, increased use of invasive medical procedures, the increasing prevalence of drug resistance, and the emergence of additional Candida species as serious pathogens, it has never been more critical to improve our understanding of Candida biology to guide the development of better treatments. In this brief review, we examine the importance of GO in the annotation of C. albicans gene products, with a focus on those involved in pathogenesis. We also discuss how sequence information combined with GO facilitates the transfer of knowledge across related species, and the challenges and opportunities that such an approach presents.
PMCID: PMC3907193  PMID: 19577928
3.  PortEco: a resource for exploring bacterial biology through high-throughput data and analysis tools 
Nucleic Acids Research  2013;42(D1):D677-D684.
PortEco ( aims to collect, curate and provide data and analysis tools to support basic biological research in Escherichia coli (and eventually other bacterial systems). PortEco is implemented as a ‘virtual’ model organism database that provides a single unified interface to the user, while integrating information from a variety of sources. The main focus of PortEco is to enable broad use of the growing number of high-throughput experiments available for E. coli, and to leverage community annotation through the EcoliWiki and GONUTS systems. Currently, PortEco includes curated data from hundreds of genome-wide RNA expression studies, from high-throughput phenotyping of single-gene knockouts under hundreds of annotated conditions, from chromatin immunoprecipitation experiments for tens of different DNA-binding factors and from ribosome profiling experiments that yield insights into protein expression. Conditions have been annotated with a consistent vocabulary, and data have been consistently normalized to enable users to find, compare and interpret relevant experiments. PortEco includes tools for data analysis, including clustering, enrichment analysis and exploration via genome browsers. PortEco search and data analysis tools are extensively linked to the curated gene, metabolic pathway and regulation content at its sister site, EcoCyc.
PMCID: PMC3965092  PMID: 24285306
4.  Whole Genome, Whole Population Sequencing Reveals That Loss of Signaling Networks Is the Major Adaptive Strategy in a Constant Environment 
PLoS Genetics  2013;9(11):e1003972.
Molecular signaling networks are ubiquitous across life and likely evolved to allow organisms to sense and respond to environmental change in dynamic environments. Few examples exist regarding the dispensability of signaling networks, and it remains unclear whether they are an essential feature of a highly adapted biological system. Here, we show that signaling network function carries a fitness cost in yeast evolving in a constant environment. We performed whole-genome, whole-population Illumina sequencing on replicate evolution experiments and find the major theme of adaptive evolution in a constant environment is the disruption of signaling networks responsible for regulating the response to environmental perturbations. Over half of all identified mutations occurred in three major signaling networks that regulate growth control: glucose signaling, Ras/cAMP/PKA and HOG. This results in a loss of environmental sensitivity that is reproducible across experiments. However, adaptive clones show reduced viability under starvation conditions, demonstrating an evolutionary tradeoff. These mutations are beneficial in an environment with a constant and predictable nutrient supply, likely because they result in constitutive growth, but reduce fitness in an environment where nutrient supply is not constant. Our results are a clear example of the myopic nature of evolution: a loss of environmental sensitivity in a constant environment is adaptive in the short term, but maladaptive should the environment change.
Author Summary
When a population of organisms is faced with a selective pressure, such as a limiting nutrient, mutations that arise randomly may confer a fitness benefit on the individual carrying that mutation. If that individual reproduces before it is lost from the population, the frequency of that mutation may increase. Over time, many beneficial mutations will arise in a large population, but there are few high resolution experiments tracking the frequency of such mutations in an evolving population. We evolved populations of the baker's yeast in a constant environment in the presence of limiting amounts of sugar, and then used DNA sequencing to identify mutations that reached at least a 1% frequency in the population, and tracked them over time. We identified 120 mutations over three experiments, and determined that the genes and pathways that had gained beneficial mutations were largely reproducible across experiments, and that many of the mutations led to the loss of signaling pathways that usually sense a changing environment, allowing the cell to respond appropriately. When these mutant cells were faced with uncertain environments, the mutations proved to be deleterious. Environmental sensing must carry a fitness cost in a constant environment, but is essential in a changing one.
PMCID: PMC3836717  PMID: 24278038
5.  The Aspergillus Genome Database: multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations 
Nucleic Acids Research  2013;42(D1):D705-D710.
The Aspergillus Genome Database (AspGD; is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome.
PMCID: PMC3965050  PMID: 24194595
6.  The Candida Genome Database: The new homology information page highlights protein similarity and phylogeny 
Nucleic Acids Research  2013;42(D1):D711-D716.
The Candida Genome Database (CGD, is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The goal of CGD is to facilitate and accelerate research into Candida pathogenesis and biology. The CGD Web site is organized around Locus pages, which display information collected about individual genes. Locus pages have multiple tabs for accessing different types of information; the default Summary tab provides an overview of the gene name, aliases, phenotype and Gene Ontology curation, whereas other tabs display more in-depth information, including protein product details for coding genes, notes on changes to the sequence or structure of the gene and a comprehensive reference list. Here, in this update to previous NAR Database articles featuring CGD, we describe a new tab that we have added to the Locus page, entitled the Homology Information tab, which displays phylogeny and gene similarity information for each locus.
PMCID: PMC3965001  PMID: 24185697
7.  Ras Signaling Gets Fine-Tuned: Regulation of Multiple Pathogenic Traits of Candida albicans 
Eukaryotic Cell  2013;12(10):1316-1325.
Candida albicans is an opportunistic fungal pathogen that can cause disseminated infection in patients with indwelling catheters or other implanted medical devices. A common resident of the human microbiome, C. albicans responds to environmental signals, such as cell contact with catheter materials and exposure to serum or CO2, by triggering the expression of a variety of traits, some of which are known to contribute to its pathogenic lifestyle. Such traits include adhesion, biofilm formation, filamentation, white-to-opaque (W-O) switching, and two recently described phenotypes, finger and tentacle formation. Under distinct sets of environmental conditions and in specific cell types (mating type-like a [MTLa]/alpha cells, MTL homozygotes, or daughter cells), C. albicans utilizes (or reutilizes) a single signal transduction pathway—the Ras pathway—to affect these phenotypes. Ras1, Cyr1, Tpk2, and Pde2, the proteins of the Ras signaling pathway, are the only nontranscriptional regulatory proteins that are known to be essential for regulating all of these processes. How does C. albicans utilize this one pathway to regulate all of these phenotypes? The regulation of distinct and yet related processes by a single, evolutionarily conserved pathway is accomplished through the use of downstream transcription factors that are active under specific environmental conditions and in different cell types. In this minireview, we discuss the role of Ras signaling pathway components and Ras pathway-regulated transcription factors as well as the transcriptional regulatory networks that fine-tune gene expression in diverse biological contexts to generate specific phenotypes that impact the virulence of C. albicans.
PMCID: PMC3811338  PMID: 23913542
8.  Turbidostat Culture of Saccharomyces cerevisiae W303-1A under Selective Pressure Elicited by Ethanol Selects for Mutations in SSD1 and UTH1 
Fems Yeast Research  2012;12(5):521-533.
We investigated the genetic causes of ethanol tolerance by characterizing mutations selected in Saccharomyces cerevisiae W303-1A under the selective pressure of ethanol. W303-1A was subjected to three rounds of turbidostat, in medium supplemented with increasing amounts of ethanol. By the end of selection, the growth rate of the culture has increased from 0.029 h-1 to 0.32 h-1. Unlike the progenitor strain, all yeast cells isolated from this population were able to form colonies on medium supplemented with 7% ethanol within six days, our definition of ethanol tolerance. Several clones selected from all three stages of selection were able to form dense colonies within two days on solid medium supplemented with 9% ethanol. We sequenced the whole genomes of 6 clones and identified mutations responsible for ethanol tolerance. Thirteen additional clones were tested for the presence of similar mutations. In 15 out of 19 tolerant clones the stop-codon in ssd1-d was replaced with an aminoacid-encoding codon. Three other clones contained one of two mutations in UTH1, and one clone did not contain mutations in either SSD1 or UTH1. We showed that the mutations in SSD1 and UTH1 increased tolerance of the cell wall to zymolyase and conclude that stability of the cell wall is a major factor in increased tolerance to ethanol.
PMCID: PMC3393845  PMID: 22443114
ethanol tolerance; SSD1; UTH1; turbidostat; cell wall
9.  Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae 
BMC Microbiology  2013;13:91.
Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research.
We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation.
This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites.
PMCID: PMC3689640  PMID: 23617571
Aspergillus; Gene clusters; Gene Ontology; Genome annotation; Secondary metabolism; Sybil
10.  Recurrent Rearrangement during Adaptive Evolution in an Interspecific Yeast Hybrid Suggests a Model for Rapid Introgression 
PLoS Genetics  2013;9(3):e1003366.
Genome rearrangements are associated with eukaryotic evolutionary processes ranging from tumorigenesis to speciation. Rearrangements are especially common following interspecific hybridization, and some of these could be expected to have strong selective value. To test this expectation we created de novo interspecific yeast hybrids between two diverged but largely syntenic Saccharomyces species, S. cerevisiae and S. uvarum, then experimentally evolved them under continuous ammonium limitation. We discovered that a characteristic interspecific genome rearrangement arose multiple times in independently evolved populations. We uncovered nine different breakpoints, all occurring in a narrow ∼1-kb region of chromosome 14, and all producing an “interspecific fusion junction” within the MEP2 gene coding sequence, such that the 5′ portion derives from S. cerevisiae and the 3′ portion derives from S. uvarum. In most cases the rearrangements altered both chromosomes, resulting in what can be considered to be an introgression of a several-kb region of S. uvarum into an otherwise intact S. cerevisiae chromosome 14, while the homeologous S. uvarum chromosome 14 experienced an interspecific reciprocal translocation at the same breakpoint within MEP2, yielding a chimaeric chromosome; these events result in the presence in the cell of two MEP2 fusion genes having identical breakpoints. Given that MEP2 encodes for a high-affinity ammonium permease, that MEP2 fusion genes arise repeatedly under ammonium-limitation, and that three independent evolved isolates carrying MEP2 fusion genes are each more fit than their common ancestor, the novel MEP2 fusion genes are very likely adaptive under ammonium limitation. Our results suggest that, when homoploid hybrids form, the admixture of two genomes enables swift and otherwise unavailable evolutionary innovations. Furthermore, the architecture of the MEP2 rearrangement suggests a model for rapid introgression, a phenomenon seen in numerous eukaryotic phyla, that does not require repeated backcrossing to one of the parental species.
Author Summary
Interspecific hybridization occurs when two different species mate and produce viable offspring. While hybrid offspring are usually sterile, like the mule, which results from a horse–donkey mating, sometimes they are fertile, creating new species. Indeed, many plant and animal species have arisen via this mechanism. Because interspecific hybridization occurs between different yeast species, and because they are such tractable models, yeast are ideally suited for experimentally investigating the genomic consequences of interspecific hybridization. We created an interspecific yeast hybrid by crossing S. cerevisiae and S. uvarum, and then studied genomic changes that occurred as it adaptively evolved in a stressful nitrogen-limiting environment. We discovered that a characteristic rearrangement between the parental species' chromosomes evolved independently many times, and always within a particular gene encoding a protein that imports nitrogen into the cell. Evolved hybrids carrying this rearrangement grew faster under nitrogen-limitation than ancestral hybrids, suggesting that the rearrangement is beneficial in nitrogen-poor environments. Our results suggest that having the genomes of two different species within a cell provides novel sources of variation for evolution to act upon, leading to adaptations that could not occur in either parental species.
PMCID: PMC3605161  PMID: 23555283
11.  Improved Gene Ontology Annotation for Biofilm Formation, Filamentous Growth, and Phenotypic Switching in Candida albicans 
Eukaryotic Cell  2013;12(1):101-108.
The opportunistic fungal pathogen Candida albicans is a significant medical threat, especially for immunocompromised patients. Experimental research has focused on specific areas of C. albicans biology, with the goal of understanding the multiple factors that contribute to its pathogenic potential. Some of these factors include cell adhesion, invasive or filamentous growth, and the formation of drug-resistant biofilms. The Gene Ontology (GO) ( is a standardized vocabulary that the Candida Genome Database (CGD) ( and other groups use to describe the functions of gene products. To improve the breadth and accuracy of pathogenicity-related gene product descriptions and to facilitate the description of as yet uncharacterized but potentially pathogenicity-related genes in Candida species, CGD undertook a three-part project: first, the addition of terms to the biological process branch of the GO to improve the description of fungus-related processes; second, manual recuration of gene product annotations in CGD to use the improved GO vocabulary; and third, computational ortholog-based transfer of GO annotations from experimentally characterized gene products, using these new terms, to uncharacterized orthologs in other Candida species. Through genome annotation and analysis, we identified candidate pathogenicity genes in seven non-C. albicans Candida species and in one additional C. albicans strain, WO-1. We also defined a set of C. albicans genes at the intersection of biofilm formation, filamentous growth, pathogenesis, and phenotypic switching of this opportunistic fungal pathogen, which provides a compelling list of candidates for further experimentation.
PMCID: PMC3535841  PMID: 23143685
12.  APJ1 and GRE3 Homologs Work in Concert to Allow Growth in Xylose in a Natural Saccharomyces sensu stricto Hybrid Yeast 
Genetics  2012;191(2):621-632.
Creating Saccharomyces yeasts capable of efficient fermentation of pentoses such as xylose remains a key challenge in the production of ethanol from lignocellulosic biomass. Metabolic engineering of industrial Saccharomyces cerevisiae strains has yielded xylose-fermenting strains, but these strains have not yet achieved industrial viability due largely to xylose fermentation being prohibitively slower than that of glucose. Recently, it has been shown that naturally occurring xylose-utilizing Saccharomyces species exist. Uncovering the genetic architecture of such strains will shed further light on xylose metabolism, suggesting additional engineering approaches or possibly even enabling the development of xylose-fermenting yeasts that are not genetically modified. We previously identified a hybrid yeast strain, the genome of which is largely Saccharomyces uvarum, which has the ability to grow on xylose as the sole carbon source. To circumvent the sterility of this hybrid strain, we developed a novel method to genetically characterize its xylose-utilization phenotype, using a tetraploid intermediate, followed by bulk segregant analysis in conjunction with high-throughput sequencing. We found that this strain’s growth in xylose is governed by at least two genetic loci, within which we identified the responsible genes: one locus contains a known xylose-pathway gene, a novel homolog of the aldo-keto reductase gene GRE3, while a second locus contains a homolog of APJ1, which encodes a putative chaperone not previously connected to xylose metabolism. Our work demonstrates that the power of sequencing combined with bulk segregant analysis can also be applied to a nongenetically tractable hybrid strain that contains a complex, polygenic trait, and identifies new avenues for metabolic engineering as well as for construction of nongenetically modified xylose-fermenting strains.
PMCID: PMC3374322  PMID: 22426884
growth in xylose; bulk segregant analysis; Saccharomyces hybrid; genome sequencing; lignocellulosic ethanol
13.  Different selective pressures lead to different genomic outcomes as newly-formed hybrid yeasts evolve 
Interspecific hybridization occurs in every eukaryotic kingdom. While hybrid progeny are frequently at a selective disadvantage, in some instances their increased genome size and complexity may result in greater stress resistance than their ancestors, which can be adaptively advantageous at the edges of their ancestors' ranges. While this phenomenon has been repeatedly documented in the field, the response of hybrid populations to long-term selection has not often been explored in the lab. To fill this knowledge gap we crossed the two most distantly related members of the Saccharomyces sensu stricto group, S. cerevisiae and S. uvarum, and established a mixed population of homoploid and aneuploid hybrids to study how different types of selection impact hybrid genome structure.
As temperature was raised incrementally from 31°C to 46.5°C over 500 generations of continuous culture, selection favored loss of the S. uvarum genome, although the kinetics of genome loss differed among independent replicates. Temperature-selected isolates exhibited greater inherent and induced thermal tolerance than parental species and founding hybrids, and also exhibited ethanol resistance. In contrast, as exogenous ethanol was increased from 0% to 14% over 500 generations of continuous culture, selection favored euploid S. cerevisiae x S. uvarum hybrids. Ethanol-selected isolates were more ethanol tolerant than S. uvarum and one of the founding hybrids, but did not exhibit resistance to temperature stress. Relative to parental and founding hybrids, temperature-selected strains showed heritable differences in cell wall structure in the forms of increased resistance to zymolyase digestion and Micafungin, which targets cell wall biosynthesis.
This is the first study to show experimentally that the genomic fate of newly-formed interspecific hybrids depends on the type of selection they encounter during the course of evolution, underscoring the importance of the ecological theatre in determining the outcome of the evolutionary play.
PMCID: PMC3372441  PMID: 22471618
14.  GC-Content Normalization for RNA-Seq Data 
BMC Bioinformatics  2011;12:480.
Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof.
We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq.
Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.
PMCID: PMC3315510  PMID: 22177264
15.  The Aspergillus Genome Database (AspGD): recent developments in comprehensive multispecies curation, comparative genomics and community resources 
Nucleic Acids Research  2011;40(D1):D653-D659.
The Aspergillus Genome Database (AspGD; is a freely available, web-based resource for researchers studying fungi of the genus Aspergillus, which includes organisms of clinical, agricultural and industrial importance. AspGD curators have now completed comprehensive review of the entire published literature about Aspergillus nidulans and Aspergillus fumigatus, and this annotation is provided with streamlined, ortholog-based navigation of the multispecies information. AspGD facilitates comparative genomics by providing a full-featured genomics viewer, as well as matched and standardized sets of genomic information for the sequenced aspergilli. AspGD also provides resources to foster interaction and dissemination of community information and resources. We welcome and encourage feedback at
PMCID: PMC3245136  PMID: 22080559
16.  The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata 
Nucleic Acids Research  2011;40(D1):D667-D674.
The Candida Genome Database (CGD, is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at We welcome feedback from the research community at
PMCID: PMC3245171  PMID: 22064862
17.  Hunger Artists: Yeast Adapted to Carbon Limitation Show Trade-Offs under Carbon Sufficiency 
PLoS Genetics  2011;7(8):e1002202.
As organisms adaptively evolve to a new environment, selection results in the improvement of certain traits, bringing about an increase in fitness. Trade-offs may result from this process if function in other traits is reduced in alternative environments either by the adaptive mutations themselves or by the accumulation of neutral mutations elsewhere in the genome. Though the cost of adaptation has long been a fundamental premise in evolutionary biology, the existence of and molecular basis for trade-offs in alternative environments are not well-established. Here, we show that yeast evolved under aerobic glucose limitation show surprisingly few trade-offs when cultured in other carbon-limited environments, under either aerobic or anaerobic conditions. However, while adaptive clones consistently outperform their common ancestor under carbon limiting conditions, in some cases they perform less well than their ancestor in aerobic, carbon-rich environments, indicating that trade-offs can appear when resources are non-limiting. To more deeply understand how adaptation to one condition affects performance in others, we determined steady-state transcript abundance of adaptive clones grown under diverse conditions and performed whole-genome sequencing to identify mutations that distinguish them from one another and from their common ancestor. We identified mutations in genes involved in glucose sensing, signaling, and transport, which, when considered in the context of the expression data, help explain their adaptation to carbon poor environments. However, different sets of mutations in each independently evolved clone indicate that multiple mutational paths lead to the adaptive phenotype. We conclude that yeasts that evolve high fitness under one resource-limiting condition also become more fit under other resource-limiting conditions, but may pay a fitness cost when those same resources are abundant.
Author Summary
Microorganisms such as yeast have been used for decades to study adaptive evolution by natural selection. Thirty years ago in now seminal experiments, a strain of yeast was evolved multiple times under carbon limitation. The adaptive changes that gave rise to increases in fitness have previously been studied both phenomenologically and mechanistically but not in detail at the molecular level. To better understand the basis for these strains' fitness increase, we sequenced their genomes and identified putative adaptive mutations. We found that multiple mutational paths lead to these fitness increases. We also determined whether the evolved yeasts' gains in fitness under the original conditions in some cases diminished fitness under other conditions. We therefore evaluated their performance relative to the ancestral strain under the evolutionary and two alternative resource-limiting conditions by determining the ancestral and evolved strains' relative fitnesses and gene-expression levels under all three conditions. We found scant evidence among evolved strains for fitness trade-offs when nutrients were scarce, but discovered a cost was paid when nutrients were plentiful.
PMCID: PMC3150441  PMID: 21829391
18.  Reciprocal Sign Epistasis between Frequently Experimentally Evolved Adaptive Mutations Causes a Rugged Fitness Landscape 
PLoS Genetics  2011;7(4):e1002056.
The fitness landscape captures the relationship between genotype and evolutionary fitness and is a pervasive metaphor used to describe the possible evolutionary trajectories of adaptation. However, little is known about the actual shape of fitness landscapes, including whether valleys of low fitness create local fitness optima, acting as barriers to adaptive change. Here we provide evidence of a rugged molecular fitness landscape arising during an evolution experiment in an asexual population of Saccharomyces cerevisiae. We identify the mutations that arose during the evolution using whole-genome sequencing and use competitive fitness assays to describe the mutations individually responsible for adaptation. In addition, we find that a fitness valley between two adaptive mutations in the genes MTH1 and HXT6/HXT7 is caused by reciprocal sign epistasis, where the fitness cost of the double mutant prohibits the two mutations from being selected in the same genetic background. The constraint enforced by reciprocal sign epistasis causes the mutations to remain mutually exclusive during the experiment, even though adaptive mutations in these two genes occur several times in independent lineages during the experiment. Our results show that epistasis plays a key role during adaptation and that inter-genic interactions can act as barriers between adaptive solutions. These results also provide a new interpretation on the classic Dobzhansky-Muller model of reproductive isolation and display some surprising parallels with mutations in genes often associated with tumors.
Author Summary
How organisms adapt to their environment is of central importance in biology, but the molecular underpinnings of adaptation are difficult to discover. Fitness landscapes illustrate possible steps adaptive evolution can take to increase the evolutionary fitness of individuals within a population, and the shape of the fitness landscape determines the accessibility of the fittest point on the landscape. On a rugged landscape, negative interactions between mutations cause fitness valleys separating fitness peaks, which can constrain adaptation and act as an adaptive barrier. Here, we comprehensively characterized the fitness of mutations that arose in clones during a yeast experimental evolution and found that mutations in two loci, MTH1 and HXT6/HXT7, arose multiple times independently and are individually adaptive. However, when forced to co-occur, the double mutant has a lower fitness than either single mutant and even the wild-type strain. This negative interaction forces these two mutations to remain mutually exclusive during the experimental evolution and results in a rugged fitness landscape, where genetic constraint prevents lineages carrying the MTH1 mutation from reaching the higher fitness peak of HXT6/HXT7. These results show that genetic interactions are central in shaping a very active portion of this fitness landscape.
PMCID: PMC3084205  PMID: 21552329
19.  Comparison of the Complete Protein Sets of Worm and Yeast: Orthology and Divergence 
Science (New York, N.Y.)  1998;282(5396):2022-2028.
Comparative analysis of predicted protein sequences encoded by the genomes of Caenorhabditis elegans and Saccharomyces cerevisiae suggests that most of the core biological functions are carried out by orthologous proteins (proteins of different species that can be traced back to a common ancestor) that occur in comparable numbers. The specialized processes of signal transduction and regulatory control that are unique to the multicellular worm appear to use novel proteins, many of which re-use conserved domains. Major expansion of the number of some of these domains seen in the worm may have contributed to the advent of multicellularity. The proteins conserved in yeast and worm are likely to have orthologs throughout eukaryotes; in contrast, the proteins unique to the worm may well define metazoans.
PMCID: PMC3057080  PMID: 9851918
20.  GO::TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes 
Bioinformatics (Oxford, England)  2004;20(18):3710-3715.
GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology (GO) information and evaluating and visualizing the collective annotation of a list of genes to GO terms. It can be used to draw conclusions from microarray and other biological data, calculating the statistical significance of each annotation. GO::TermFinder can be used on any system on which Perl can be run, either as a command line application, in single or batch mode, or as a web-based CGI script.
The full source code and documentation for GO::TermFinder are freely available from
PMCID: PMC3037731  PMID: 15297299
21.  Gene Ontology: tool for the unification of biology 
Nature genetics  2000;25(1):25-29.
Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web ( are being constructed: biological process, molecular function and cellular component.
PMCID: PMC3037419  PMID: 10802651
22.  Whole-Genome Comparison Reveals Novel Genetic Elements That Characterize the Genome of Industrial Strains of Saccharomyces cerevisiae 
PLoS Genetics  2011;7(2):e1001287.
Human intervention has subjected the yeast Saccharomyces cerevisiae to multiple rounds of independent domestication and thousands of generations of artificial selection. As a result, this species comprises a genetically diverse collection of natural isolates as well as domesticated strains that are used in specific industrial applications. However the scope of genetic diversity that was captured during the domesticated evolution of the industrial representatives of this important organism remains to be determined. To begin to address this, we have produced whole-genome assemblies of six commercial strains of S. cerevisiae (four wine and two brewing strains). These represent the first genome assemblies produced from S. cerevisiae strains in their industrially-used forms and the first high-quality assemblies for S. cerevisiae strains used in brewing. By comparing these sequences to six existing high-coverage S. cerevisiae genome assemblies, clear signatures were found that defined each industrial class of yeast. This genetic variation was comprised of both single nucleotide polymorphisms and large-scale insertions and deletions, with the latter often being associated with ORF heterogeneity between strains. This included the discovery of more than twenty probable genes that had not been identified previously in the S. cerevisiae genome. Comparison of this large number of S. cerevisiae strains also enabled the characterization of a cluster of five ORFs that have integrated into the genomes of the wine and bioethanol strains on multiple occasions and at diverse genomic locations via what appears to involve the resolution of a circular DNA intermediate. This work suggests that, despite the scrutiny that has been directed at the yeast genome, there remains a significant reservoir of ORFs and novel modes of genetic transmission that may have significant phenotypic impact in this important model and industrial species.
Author Summary
The yeast S. cerevisiae has been associated with human activity for thousands of years in industries such as baking, brewing, and winemaking. During this time, humans have effectively domesticated this microorganism, with different industries selecting for specific desirable phenotypic traits. This has resulted in the species S. cerevisiae comprising a genetically diverse collection of individual strains that are often suited to very specific roles (e.g. wine strains produce wine but not beer and vice versa). In order to understand the genetic differences that underpin these diverse industrial characteristics, we have sequenced the genomes of six industrial strains of S. cerevisiae that comprise four strains used in commercial wine production and two strains used in beer brewing. By comparing these genome sequences to existing S. cerevisiae genome sequences from laboratory, pathogenic, bioethanol, and “natural” isolates, we were able to identify numerous genetic differences among these strains including the presence of novel open reading frames and genomic rearrangements, which may provide the basis for the phenotypic differences observed among these strains.
PMCID: PMC3033381  PMID: 21304888
23.  Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads 
BMC Genomics  2010;11:663.
Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied.
Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95%) and reconstruct full-length genes for the majority of the existing gene models (54.3%). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics.
These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.
PMCID: PMC3152782  PMID: 21106091
24.  Annotare—a tool for annotating high-throughput biomedical investigations and resulting data 
Bioinformatics  2010;26(19):2470-2471.
Summary: Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis.
Availability and Implementation: Annotare is available from under the terms of the open-source MIT License ( It has been tested on both Mac and Windows.
PMCID: PMC2944206  PMID: 20733062
25.  A Genome-Wide Analysis Reveals No Nuclear Dobzhansky-Muller Pairs of Determinants of Speciation between S. cerevisiae and S. paradoxus, but Suggests More Complex Incompatibilities 
PLoS Genetics  2010;6(7):e1001038.
The Dobzhansky-Muller (D-M) model of speciation by genic incompatibility is widely accepted as the primary cause of interspecific postzygotic isolation. Since the introduction of this model, there have been theoretical and experimental data supporting the existence of such incompatibilities. However, speciation genes have been largely elusive, with only a handful of candidate genes identified in a few organisms. The Saccharomyces sensu stricto yeasts, which have small genomes and can mate interspecifically to produce sterile hybrids, are thus an ideal model for studying postzygotic isolation. Among them, only a single D-M pair, comprising a mitochondrially targeted product of a nuclear gene and a mitochondrially encoded locus, has been found. Thus far, no D-M pair of nuclear genes has been identified between any sensu stricto yeasts. We report here the first detailed genome-wide analysis of rare meiotic products from an otherwise sterile hybrid and show that no classic D-M pairs of speciation genes exist between the nuclear genomes of the closely related yeasts S. cerevisiae and S. paradoxus. Instead, our analyses suggest that more complex interactions, likely involving multiple loci having weak effects, may be responsible for their post-zygotic separation. The lack of a nuclear encoded classic D-M pair between these two yeasts, yet the existence of multiple loci that may each exert a small effect through complex interactions suggests that initial speciation events might not always be mediated by D-M pairs. An alternative explanation may be that the accumulation of polymorphisms leads to gamete inviability due to the activities of anti-recombination mechanisms and/or incompatibilities between the species' transcriptional and metabolic networks, with no single pair at least initially being responsible for the incompatibility. After such a speciation event, it is possible that one or more D-M pairs might subsequently arise following isolation.
Author Summary
Species are defined such that organisms of the same species can produce fertile offspring, whereas organisms of different species are either unable to mate, or when they do, they produce inviable or sterile progeny. A well-known pair of species that can mate yet produce sterile offspring is the horse and donkey, which produce an infertile hybrid, the mule. A long-standing idea for the species barrier is that when certain pairs of genes from the two different species are combined, the genes can no longer function properly, thus causing death or sterility. Identification of these incompatible genes may allow us to determine how organisms form distinct species, and understand the process of speciation itself. We used two closely related yeasts to look for these incompatible genes by isolating rare viable hybrid offspring, and looking for excluded gene combinations. We did not find any pairs of incompatible genes, but instead found that there appear to be more than two genes involved in such incompatibilities. We speculate that the accumulation of large numbers of sequence differences in their DNA may cause defects in how genes are controlled in hybrids, causing these two yeasts to be independent species.
PMCID: PMC2912382  PMID: 20686707

Results 1-25 (56)