Cryptococcus, a major cause of disseminated infections in immunocompromised patients, kills over 600,000 people per year worldwide. Genes involved in the virulence of the meningitis-causing fungus are being characterized at an increasing rate, and to date, at least 648 Cryptococcus gene names have been published. However, these data are scattered throughout the literature and are challenging to find. Furthermore, conflicts in locus identification exist, so that named genes have been subsequently published under new names or names associated with one locus have been used for another locus. To avoid these conflicts and to provide a central source of Cryptococcus gene information, we have collected all published Cryptococcus gene names from the scientific literature and associated them with standard Cryptococcus locus identifiers and have incorporated them into FungiDB (www.fungidb.org). FungiDB is a panfungal genome database that collects gene information and functional data and provides search tools for 61 species of fungi and oomycetes. We applied these published names to a manually curated ortholog set of all Cryptococcus species currently in FungiDB, including Cryptococcus neoformans var. neoformans strains JEC21 and B-3501A, C. neoformans var. grubii strain H99, and Cryptococcus gattii strains R265 and WM276, and have written brief descriptions of their functions. We also compiled a protocol for gene naming that summarizes guidelines proposed by members of the Cryptococcus research community. The centralization of genomic and literature-based information for Cryptococcus at FungiDB will help researchers communicate about genes of interest, such as those related to virulence, and will further facilitate research on the pathogen.
A major goal of genetics is to define the relationship between phenotype and genotype, while a major goal of ecology is to identify the rules that govern community assembly. Achieving these goals by analyzing natural systems can be difficult, as selective pressures create dynamic fitness landscapes that vary in both space and time. Laboratory experimental evolution offers the benefit of controlling variables that shape fitness landscapes, helping to achieve both goals. We previously showed that a clonal population of E. coli experimentally evolved under continuous glucose limitation gives rise to a genetically diverse community consisting of one clone, CV103, that best scavenges but incompletely utilizes the limiting resource, and others, CV101 and CV116, that consume its overflow metabolites. Because this community can be disassembled and reassembled, and involves cooperative interactions that are stable over time, its genetic diversity is sustained by clonal reinforcement rather than by clonal interference. To understand the genetic factors that produce this outcome, and to illuminate the community's underlying physiology, we sequenced the genomes of ancestral and evolved clones. We identified ancestral mutations in intermediary metabolism that may have predisposed the evolution of metabolic interdependence. Phylogenetic reconstruction indicates that the lineages that gave rise to this community diverged early, as CV103 shares only one Single Nucleotide Polymorphism with the other evolved clones. Underlying CV103's phenotype we identified a set of mutations that likely enhance glucose scavenging and maintain redox balance, but may do so at the expense of carbon excreted in overflow metabolites. Because these overflow metabolites serve as growth substrates that are differentially accessible to the other community members, and because the scavenging lineage shares only one SNP with these other clones, we conclude that this lineage likely served as an “engine” generating diversity by creating new metabolic niches, but not the occupants themselves.
The variability of natural systems makes it difficult to deduce how organisms' genotypes manifest as phenotypes, and how communities of interacting organisms arise. Using laboratory experimental evolution we can control this variation. We previously showed that a population of E. coli that originated from a single clone and was cultured in the presence of a single limiting resource, evolves into a stable, three-membered community, wherein one clone excretes metabolites that the others utilize as carbon sources. To discern the genetic factors at work in producing this outcome and to illuminate the community's physiology, we sequenced the genomes of the ancestral and evolved clones. We identified in the ancestor mutations that may have predisposed evolution of cross-feeding. We found that the lineages which gave rise to the community diverged early on, and that the numerically dominant lineage that best scavenges limiting glucose does so as a result of adaptive mutations that enhance glucose uptake but favor fermentative over respiratory pathways, resulting in overflow metabolites. Because this clone produces secondary resources that sustain other community members, and because it shares with them only one mutation, we conclude that it is an “engine” generating diversity by creating new niches, but not the occupants themselves.
Manual extraction of information from the biomedical literature—or biocuration—is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate.
Characterization of the cell cycle–regulated transcripts in U2OS cells yielded 1871 unique genes. FOXM1 targets were identified via ChIP-seq, and novel targets in G2/M and S phases were verified using a real-time luciferase assay. ChIP-seq data were used to map cell cycle transcriptional regulators of cell cycle–regulated gene expression in U2OS cells.
We identify the cell cycle–regulated mRNA transcripts genome-wide in the osteosarcoma-derived U2OS cell line. This results in 2140 transcripts mapping to 1871 unique cell cycle–regulated genes that show periodic oscillations across multiple synchronous cell cycles. We identify genomic loci bound by the G2/M transcription factor FOXM1 by chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and associate these with cell cycle–regulated genes. FOXM1 is bound to cell cycle–regulated genes with peak expression in both S phase and G2/M phases. We show that ChIP-seq genomic loci are responsive to FOXM1 using a real-time luciferase assay in live cells, showing that FOXM1 strongly activates promoters of G2/M phase genes and weakly activates those induced in S phase. Analysis of ChIP-seq data from a panel of cell cycle transcription factors (E2F1, E2F4, E2F6, and GABPA) from the Encyclopedia of DNA Elements and ChIP-seq data for the DREAM complex finds that a set of core cell cycle genes regulated in both U2OS and HeLa cells are bound by multiple cell cycle transcription factors. These data identify the cell cycle–regulated genes in a second cancer-derived cell line and provide a comprehensive picture of the transcriptional regulatory systems controlling periodic gene expression in the human cell division cycle.
The Gene Ontology (GO) is a structured controlled vocabulary developed to describe the roles and locations of gene products in a consistent fashion, in a way that can be shared across organisms. The unicellular fungus Candida albicans is similar in many ways to the model organism Saccharomyces cerevisiae, but as both a commensal and a pathogen of humans, differs greatly in its lifestyle. With an expanding at-risk population of immunosuppressed patients, increased use of invasive medical procedures, the increasing prevalence of drug resistance, and the emergence of additional Candida species as serious pathogens, it has never been more critical to improve our understanding of Candida biology to guide the development of better treatments. In this brief review, we examine the importance of GO in the annotation of C. albicans gene products, with a focus on those involved in pathogenesis. We also discuss how sequence information combined with GO facilitates the transfer of knowledge across related species, and the challenges and opportunities that such an approach presents.
PortEco (http://porteco.org) aims to collect, curate and provide data and analysis tools to support basic biological research in Escherichia coli (and eventually other bacterial systems). PortEco is implemented as a ‘virtual’ model organism database that provides a single unified interface to the user, while integrating information from a variety of sources. The main focus of PortEco is to enable broad use of the growing number of high-throughput experiments available for E. coli, and to leverage community annotation through the EcoliWiki and GONUTS systems. Currently, PortEco includes curated data from hundreds of genome-wide RNA expression studies, from high-throughput phenotyping of single-gene knockouts under hundreds of annotated conditions, from chromatin immunoprecipitation experiments for tens of different DNA-binding factors and from ribosome profiling experiments that yield insights into protein expression. Conditions have been annotated with a consistent vocabulary, and data have been consistently normalized to enable users to find, compare and interpret relevant experiments. PortEco includes tools for data analysis, including clustering, enrichment analysis and exploration via genome browsers. PortEco search and data analysis tools are extensively linked to the curated gene, metabolic pathway and regulation content at its sister site, EcoCyc.
Molecular signaling networks are ubiquitous across life and likely evolved to allow organisms to sense and respond to environmental change in dynamic environments. Few examples exist regarding the dispensability of signaling networks, and it remains unclear whether they are an essential feature of a highly adapted biological system. Here, we show that signaling network function carries a fitness cost in yeast evolving in a constant environment. We performed whole-genome, whole-population Illumina sequencing on replicate evolution experiments and find the major theme of adaptive evolution in a constant environment is the disruption of signaling networks responsible for regulating the response to environmental perturbations. Over half of all identified mutations occurred in three major signaling networks that regulate growth control: glucose signaling, Ras/cAMP/PKA and HOG. This results in a loss of environmental sensitivity that is reproducible across experiments. However, adaptive clones show reduced viability under starvation conditions, demonstrating an evolutionary tradeoff. These mutations are beneficial in an environment with a constant and predictable nutrient supply, likely because they result in constitutive growth, but reduce fitness in an environment where nutrient supply is not constant. Our results are a clear example of the myopic nature of evolution: a loss of environmental sensitivity in a constant environment is adaptive in the short term, but maladaptive should the environment change.
When a population of organisms is faced with a selective pressure, such as a limiting nutrient, mutations that arise randomly may confer a fitness benefit on the individual carrying that mutation. If that individual reproduces before it is lost from the population, the frequency of that mutation may increase. Over time, many beneficial mutations will arise in a large population, but there are few high resolution experiments tracking the frequency of such mutations in an evolving population. We evolved populations of the baker's yeast in a constant environment in the presence of limiting amounts of sugar, and then used DNA sequencing to identify mutations that reached at least a 1% frequency in the population, and tracked them over time. We identified 120 mutations over three experiments, and determined that the genes and pathways that had gained beneficial mutations were largely reproducible across experiments, and that many of the mutations led to the loss of signaling pathways that usually sense a changing environment, allowing the cell to respond appropriately. When these mutant cells were faced with uncertain environments, the mutations proved to be deleterious. Environmental sensing must carry a fitness cost in a constant environment, but is essential in a changing one.
The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome.
The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The goal of CGD is to facilitate and accelerate research into Candida pathogenesis and biology. The CGD Web site is organized around Locus pages, which display information collected about individual genes. Locus pages have multiple tabs for accessing different types of information; the default Summary tab provides an overview of the gene name, aliases, phenotype and Gene Ontology curation, whereas other tabs display more in-depth information, including protein product details for coding genes, notes on changes to the sequence or structure of the gene and a comprehensive reference list. Here, in this update to previous NAR Database articles featuring CGD, we describe a new tab that we have added to the Locus page, entitled the Homology Information tab, which displays phylogeny and gene similarity information for each locus.
Candida albicans is an opportunistic fungal pathogen that can cause disseminated infection in patients with indwelling catheters or other implanted medical devices. A common resident of the human microbiome, C. albicans responds to environmental signals, such as cell contact with catheter materials and exposure to serum or CO2, by triggering the expression of a variety of traits, some of which are known to contribute to its pathogenic lifestyle. Such traits include adhesion, biofilm formation, filamentation, white-to-opaque (W-O) switching, and two recently described phenotypes, finger and tentacle formation. Under distinct sets of environmental conditions and in specific cell types (mating type-like a [MTLa]/alpha cells, MTL homozygotes, or daughter cells), C. albicans utilizes (or reutilizes) a single signal transduction pathway—the Ras pathway—to affect these phenotypes. Ras1, Cyr1, Tpk2, and Pde2, the proteins of the Ras signaling pathway, are the only nontranscriptional regulatory proteins that are known to be essential for regulating all of these processes. How does C. albicans utilize this one pathway to regulate all of these phenotypes? The regulation of distinct and yet related processes by a single, evolutionarily conserved pathway is accomplished through the use of downstream transcription factors that are active under specific environmental conditions and in different cell types. In this minireview, we discuss the role of Ras signaling pathway components and Ras pathway-regulated transcription factors as well as the transcriptional regulatory networks that fine-tune gene expression in diverse biological contexts to generate specific phenotypes that impact the virulence of C. albicans.
Candida albicans is a ubiquitous opportunistic fungal pathogen that afflicts immunocompromised human hosts. With rare and transient exceptions the yeast is diploid, yet despite its clinical relevance the respective sequences of its two homologous chromosomes have not been completely resolved.
We construct a phased diploid genome assembly by deep sequencing a standard laboratory wild-type strain and a panel of strains homozygous for particular chromosomes. The assembly has 700-fold coverage on average, allowing extensive revision and expansion of the number of known SNPs and indels. This phased genome significantly enhances the sensitivity and specificity of allele-specific expression measurements by enabling pooling and cross-validation of signal across multiple polymorphic sites. Additionally, the diploid assembly reveals pervasive and unexpected patterns in allelic differences between homologous chromosomes. Firstly, we see striking clustering of indels, concentrated primarily in the repeat sequences in promoters. Secondly, both indels and their repeat-sequence substrate are enriched near replication origins. Finally, we reveal an intimate link between repeat sequences and indels, which argues that repeat length is under selective pressure for most eukaryotes. This connection is described by a concise one-parameter model that explains repeat-sequence abundance in C. albicans as a function of the indel rate, and provides a general framework to interpret repeat abundance in species ranging from bacteria to humans.
The phased genome assembly and insights into repeat plasticity will be valuable for better understanding allele-specific phenomena and genome evolution.
Haplotype; Phasing; Indel; Microsatellite; Homopolymer; Repeat
We investigated the genetic causes of ethanol tolerance by characterizing mutations selected in Saccharomyces cerevisiae W303-1A under the selective pressure of ethanol. W303-1A was subjected to three rounds of turbidostat, in medium supplemented with increasing amounts of ethanol. By the end of selection, the growth rate of the culture has increased from 0.029 h-1 to 0.32 h-1. Unlike the progenitor strain, all yeast cells isolated from this population were able to form colonies on medium supplemented with 7% ethanol within six days, our definition of ethanol tolerance. Several clones selected from all three stages of selection were able to form dense colonies within two days on solid medium supplemented with 9% ethanol. We sequenced the whole genomes of 6 clones and identified mutations responsible for ethanol tolerance. Thirteen additional clones were tested for the presence of similar mutations. In 15 out of 19 tolerant clones the stop-codon in ssd1-d was replaced with an aminoacid-encoding codon. Three other clones contained one of two mutations in UTH1, and one clone did not contain mutations in either SSD1 or UTH1. We showed that the mutations in SSD1 and UTH1 increased tolerance of the cell wall to zymolyase and conclude that stability of the cell wall is a major factor in increased tolerance to ethanol.
ethanol tolerance; SSD1; UTH1; turbidostat; cell wall
Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research.
We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation.
This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites.
Aspergillus; Gene clusters; Gene Ontology; Genome annotation; Secondary metabolism; Sybil
Genome rearrangements are associated with eukaryotic evolutionary processes ranging from tumorigenesis to speciation. Rearrangements are especially common following interspecific hybridization, and some of these could be expected to have strong selective value. To test this expectation we created de novo interspecific yeast hybrids between two diverged but largely syntenic Saccharomyces species, S. cerevisiae and S. uvarum, then experimentally evolved them under continuous ammonium limitation. We discovered that a characteristic interspecific genome rearrangement arose multiple times in independently evolved populations. We uncovered nine different breakpoints, all occurring in a narrow ∼1-kb region of chromosome 14, and all producing an “interspecific fusion junction” within the MEP2 gene coding sequence, such that the 5′ portion derives from S. cerevisiae and the 3′ portion derives from S. uvarum. In most cases the rearrangements altered both chromosomes, resulting in what can be considered to be an introgression of a several-kb region of S. uvarum into an otherwise intact S. cerevisiae chromosome 14, while the homeologous S. uvarum chromosome 14 experienced an interspecific reciprocal translocation at the same breakpoint within MEP2, yielding a chimaeric chromosome; these events result in the presence in the cell of two MEP2 fusion genes having identical breakpoints. Given that MEP2 encodes for a high-affinity ammonium permease, that MEP2 fusion genes arise repeatedly under ammonium-limitation, and that three independent evolved isolates carrying MEP2 fusion genes are each more fit than their common ancestor, the novel MEP2 fusion genes are very likely adaptive under ammonium limitation. Our results suggest that, when homoploid hybrids form, the admixture of two genomes enables swift and otherwise unavailable evolutionary innovations. Furthermore, the architecture of the MEP2 rearrangement suggests a model for rapid introgression, a phenomenon seen in numerous eukaryotic phyla, that does not require repeated backcrossing to one of the parental species.
Interspecific hybridization occurs when two different species mate and produce viable offspring. While hybrid offspring are usually sterile, like the mule, which results from a horse–donkey mating, sometimes they are fertile, creating new species. Indeed, many plant and animal species have arisen via this mechanism. Because interspecific hybridization occurs between different yeast species, and because they are such tractable models, yeast are ideally suited for experimentally investigating the genomic consequences of interspecific hybridization. We created an interspecific yeast hybrid by crossing S. cerevisiae and S. uvarum, and then studied genomic changes that occurred as it adaptively evolved in a stressful nitrogen-limiting environment. We discovered that a characteristic rearrangement between the parental species' chromosomes evolved independently many times, and always within a particular gene encoding a protein that imports nitrogen into the cell. Evolved hybrids carrying this rearrangement grew faster under nitrogen-limitation than ancestral hybrids, suggesting that the rearrangement is beneficial in nitrogen-poor environments. Our results suggest that having the genomes of two different species within a cell provides novel sources of variation for evolution to act upon, leading to adaptations that could not occur in either parental species.
The opportunistic fungal pathogen Candida albicans is a significant medical threat, especially for immunocompromised patients. Experimental research has focused on specific areas of C. albicans biology, with the goal of understanding the multiple factors that contribute to its pathogenic potential. Some of these factors include cell adhesion, invasive or filamentous growth, and the formation of drug-resistant biofilms. The Gene Ontology (GO) (www.geneontology.org) is a standardized vocabulary that the Candida Genome Database (CGD) (www.candidagenome.org) and other groups use to describe the functions of gene products. To improve the breadth and accuracy of pathogenicity-related gene product descriptions and to facilitate the description of as yet uncharacterized but potentially pathogenicity-related genes in Candida species, CGD undertook a three-part project: first, the addition of terms to the biological process branch of the GO to improve the description of fungus-related processes; second, manual recuration of gene product annotations in CGD to use the improved GO vocabulary; and third, computational ortholog-based transfer of GO annotations from experimentally characterized gene products, using these new terms, to uncharacterized orthologs in other Candida species. Through genome annotation and analysis, we identified candidate pathogenicity genes in seven non-C. albicans Candida species and in one additional C. albicans strain, WO-1. We also defined a set of C. albicans genes at the intersection of biofilm formation, filamentous growth, pathogenesis, and phenotypic switching of this opportunistic fungal pathogen, which provides a compelling list of candidates for further experimentation.
Creating Saccharomyces yeasts capable of efficient fermentation of pentoses such as xylose remains a key challenge in the production of ethanol from lignocellulosic biomass. Metabolic engineering of industrial Saccharomyces cerevisiae strains has yielded xylose-fermenting strains, but these strains have not yet achieved industrial viability due largely to xylose fermentation being prohibitively slower than that of glucose. Recently, it has been shown that naturally occurring xylose-utilizing Saccharomyces species exist. Uncovering the genetic architecture of such strains will shed further light on xylose metabolism, suggesting additional engineering approaches or possibly even enabling the development of xylose-fermenting yeasts that are not genetically modified. We previously identified a hybrid yeast strain, the genome of which is largely Saccharomyces uvarum, which has the ability to grow on xylose as the sole carbon source. To circumvent the sterility of this hybrid strain, we developed a novel method to genetically characterize its xylose-utilization phenotype, using a tetraploid intermediate, followed by bulk segregant analysis in conjunction with high-throughput sequencing. We found that this strain’s growth in xylose is governed by at least two genetic loci, within which we identified the responsible genes: one locus contains a known xylose-pathway gene, a novel homolog of the aldo-keto reductase gene GRE3, while a second locus contains a homolog of APJ1, which encodes a putative chaperone not previously connected to xylose metabolism. Our work demonstrates that the power of sequencing combined with bulk segregant analysis can also be applied to a nongenetically tractable hybrid strain that contains a complex, polygenic trait, and identifies new avenues for metabolic engineering as well as for construction of nongenetically modified xylose-fermenting strains.
growth in xylose; bulk segregant analysis; Saccharomyces hybrid; genome sequencing; lignocellulosic ethanol
Interspecific hybridization occurs in every eukaryotic kingdom. While hybrid progeny are frequently at a selective disadvantage, in some instances their increased genome size and complexity may result in greater stress resistance than their ancestors, which can be adaptively advantageous at the edges of their ancestors' ranges. While this phenomenon has been repeatedly documented in the field, the response of hybrid populations to long-term selection has not often been explored in the lab. To fill this knowledge gap we crossed the two most distantly related members of the Saccharomyces sensu stricto group, S. cerevisiae and S. uvarum, and established a mixed population of homoploid and aneuploid hybrids to study how different types of selection impact hybrid genome structure.
As temperature was raised incrementally from 31°C to 46.5°C over 500 generations of continuous culture, selection favored loss of the S. uvarum genome, although the kinetics of genome loss differed among independent replicates. Temperature-selected isolates exhibited greater inherent and induced thermal tolerance than parental species and founding hybrids, and also exhibited ethanol resistance. In contrast, as exogenous ethanol was increased from 0% to 14% over 500 generations of continuous culture, selection favored euploid S. cerevisiae x S. uvarum hybrids. Ethanol-selected isolates were more ethanol tolerant than S. uvarum and one of the founding hybrids, but did not exhibit resistance to temperature stress. Relative to parental and founding hybrids, temperature-selected strains showed heritable differences in cell wall structure in the forms of increased resistance to zymolyase digestion and Micafungin, which targets cell wall biosynthesis.
This is the first study to show experimentally that the genomic fate of newly-formed interspecific hybrids depends on the type of selection they encounter during the course of evolution, underscoring the importance of the ecological theatre in determining the outcome of the evolutionary play.
Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof.
We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq.
Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.
The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available, web-based resource for researchers studying fungi of the genus Aspergillus, which includes organisms of clinical, agricultural and industrial importance. AspGD curators have now completed comprehensive review of the entire published literature about Aspergillus nidulans and Aspergillus fumigatus, and this annotation is provided with streamlined, ortholog-based navigation of the multispecies information. AspGD facilitates comparative genomics by providing a full-featured genomics viewer, as well as matched and standardized sets of genomic information for the sequenced aspergilli. AspGD also provides resources to foster interaction and dissemination of community information and resources. We welcome and encourage feedback at email@example.com.
The Candida Genome Database (CGD, http://www.candidagenome.org/) is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at http://www.candidagenome.org/. We welcome feedback from the research community at firstname.lastname@example.org.
As organisms adaptively evolve to a new environment, selection results in the improvement of certain traits, bringing about an increase in fitness. Trade-offs may result from this process if function in other traits is reduced in alternative environments either by the adaptive mutations themselves or by the accumulation of neutral mutations elsewhere in the genome. Though the cost of adaptation has long been a fundamental premise in evolutionary biology, the existence of and molecular basis for trade-offs in alternative environments are not well-established. Here, we show that yeast evolved under aerobic glucose limitation show surprisingly few trade-offs when cultured in other carbon-limited environments, under either aerobic or anaerobic conditions. However, while adaptive clones consistently outperform their common ancestor under carbon limiting conditions, in some cases they perform less well than their ancestor in aerobic, carbon-rich environments, indicating that trade-offs can appear when resources are non-limiting. To more deeply understand how adaptation to one condition affects performance in others, we determined steady-state transcript abundance of adaptive clones grown under diverse conditions and performed whole-genome sequencing to identify mutations that distinguish them from one another and from their common ancestor. We identified mutations in genes involved in glucose sensing, signaling, and transport, which, when considered in the context of the expression data, help explain their adaptation to carbon poor environments. However, different sets of mutations in each independently evolved clone indicate that multiple mutational paths lead to the adaptive phenotype. We conclude that yeasts that evolve high fitness under one resource-limiting condition also become more fit under other resource-limiting conditions, but may pay a fitness cost when those same resources are abundant.
Microorganisms such as yeast have been used for decades to study adaptive evolution by natural selection. Thirty years ago in now seminal experiments, a strain of yeast was evolved multiple times under carbon limitation. The adaptive changes that gave rise to increases in fitness have previously been studied both phenomenologically and mechanistically but not in detail at the molecular level. To better understand the basis for these strains' fitness increase, we sequenced their genomes and identified putative adaptive mutations. We found that multiple mutational paths lead to these fitness increases. We also determined whether the evolved yeasts' gains in fitness under the original conditions in some cases diminished fitness under other conditions. We therefore evaluated their performance relative to the ancestral strain under the evolutionary and two alternative resource-limiting conditions by determining the ancestral and evolved strains' relative fitnesses and gene-expression levels under all three conditions. We found scant evidence among evolved strains for fitness trade-offs when nutrients were scarce, but discovered a cost was paid when nutrients were plentiful.
The fitness landscape captures the relationship between genotype and evolutionary fitness and is a pervasive metaphor used to describe the possible evolutionary trajectories of adaptation. However, little is known about the actual shape of fitness landscapes, including whether valleys of low fitness create local fitness optima, acting as barriers to adaptive change. Here we provide evidence of a rugged molecular fitness landscape arising during an evolution experiment in an asexual population of Saccharomyces cerevisiae. We identify the mutations that arose during the evolution using whole-genome sequencing and use competitive fitness assays to describe the mutations individually responsible for adaptation. In addition, we find that a fitness valley between two adaptive mutations in the genes MTH1 and HXT6/HXT7 is caused by reciprocal sign epistasis, where the fitness cost of the double mutant prohibits the two mutations from being selected in the same genetic background. The constraint enforced by reciprocal sign epistasis causes the mutations to remain mutually exclusive during the experiment, even though adaptive mutations in these two genes occur several times in independent lineages during the experiment. Our results show that epistasis plays a key role during adaptation and that inter-genic interactions can act as barriers between adaptive solutions. These results also provide a new interpretation on the classic Dobzhansky-Muller model of reproductive isolation and display some surprising parallels with mutations in genes often associated with tumors.
How organisms adapt to their environment is of central importance in biology, but the molecular underpinnings of adaptation are difficult to discover. Fitness landscapes illustrate possible steps adaptive evolution can take to increase the evolutionary fitness of individuals within a population, and the shape of the fitness landscape determines the accessibility of the fittest point on the landscape. On a rugged landscape, negative interactions between mutations cause fitness valleys separating fitness peaks, which can constrain adaptation and act as an adaptive barrier. Here, we comprehensively characterized the fitness of mutations that arose in clones during a yeast experimental evolution and found that mutations in two loci, MTH1 and HXT6/HXT7, arose multiple times independently and are individually adaptive. However, when forced to co-occur, the double mutant has a lower fitness than either single mutant and even the wild-type strain. This negative interaction forces these two mutations to remain mutually exclusive during the experimental evolution and results in a rugged fitness landscape, where genetic constraint prevents lineages carrying the MTH1 mutation from reaching the higher fitness peak of HXT6/HXT7. These results show that genetic interactions are central in shaping a very active portion of this fitness landscape.
Comparative analysis of predicted protein sequences encoded by the genomes of Caenorhabditis elegans and Saccharomyces cerevisiae suggests that most of the core biological functions are carried out by orthologous proteins (proteins of different species that can be traced back to a common ancestor) that occur in comparable numbers. The specialized processes of signal transduction and regulatory control that are unique to the multicellular worm appear to use novel proteins, many of which re-use conserved domains. Major expansion of the number of some of these domains seen in the worm may have contributed to the advent of multicellularity. The proteins conserved in yeast and worm are likely to have orthologs throughout eukaryotes; in contrast, the proteins unique to the worm may well define metazoans.
GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology (GO) information and evaluating and visualizing the collective annotation of a list of genes to GO terms. It can be used to draw conclusions from microarray and other biological data, calculating the statistical significance of each annotation. GO::TermFinder can be used on any system on which Perl can be run, either as a command line application, in single or batch mode, or as a web-based CGI script.
The full source code and documentation for GO::TermFinder are freely available from http://search.cpan.org/dist/GO-TermFinder/
Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.