PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1167284)

Clipboard (0)
None

Related Articles

1.  A Catalog of Neutral and Deleterious Polymorphism in Yeast 
PLoS Genetics  2008;4(8):e1000183.
The abundance and identity of functional variation segregating in natural populations is paramount to dissecting the molecular basis of quantitative traits as well as human genetic diseases. Genome sequencing of multiple organisms of the same species provides an efficient means of cataloging rearrangements, insertion, or deletion polymorphisms (InDels) and single-nucleotide polymorphisms (SNPs). While inbreeding depression and heterosis imply that a substantial amount of polymorphism is deleterious, distinguishing deleterious from neutral polymorphism remains a significant challenge. To identify deleterious and neutral DNA sequence variation within Saccharomyces cerevisiae, we sequenced the genome of a vineyard and oak tree strain and compared them to a reference genome. Among these three strains, 6% of the genome is variable, mostly attributable to variation in genome content that results from large InDels. Out of the 88,000 polymorphisms identified, 93% are SNPs and a small but significant fraction can be attributed to recent interspecific introgression and ectopic gene conversion. In comparison to the reference genome, there is substantial evidence for functional variation in gene content and structure that results from large InDels, frame-shifts, and polymorphic start and stop codons. Comparison of polymorphism to divergence reveals scant evidence for positive selection but an abundance of evidence for deleterious SNPs. We estimate that 12% of coding and 7% of noncoding SNPs are deleterious. Based on divergence among 11 yeast species, we identified 1,666 nonsynonymous SNPs that disrupt conserved amino acids and 1,863 noncoding SNPs that disrupt conserved noncoding motifs. The deleterious coding SNPs include those known to affect quantitative traits, and a subset of the deleterious noncoding SNPs occurs in the promoters of genes that show allele-specific expression, implying that some cis-regulatory SNPs are deleterious. Our results show that the genome sequences of both closely and distantly related species provide a means of identifying deleterious polymorphisms that disrupt functionally conserved coding and noncoding sequences.
Author Summary
DNA sequence variation makes an important contribution to most traits that vary in natural populations. However, mapping mutations that underlie a trait of interest is a significant challenge. Genome sequencing of multiple organisms provides a complete list of DNA sequence differences responsible for any trait that differs among the organisms. Yet, distinguishing those DNA sequence variants that contribute to a trait from all other variants is not easy. Here, we sequence the genomes of two strains of yeast and, through comparisons with a reference genome, we catalog multiple types of DNA sequence variation among the three strains. Using a variety of comparative genomics methods, we show that a substantial fraction of DNA sequence variations has deleterious effects on fitness. Finally, we show that a subset of deleterious mutations is associated with changes in gene expression levels. Our results imply that comparative genomics methods will be a valuable approach to identifying DNA sequence changes underlying numerous traits of interest.
doi:10.1371/journal.pgen.1000183
PMCID: PMC2515631  PMID: 18769710
2.  Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species 
PLoS Biology  2009;7(6):e1000134.
Analysis of the phosphoproteomes and the gene interaction networks of divergent yeast species defines the relative contribution of changes in protein phosphorylation pathways to the generation of phenotypic diversity.
The extent by which different cellular components generate phenotypic diversity is an ongoing debate in evolutionary biology that is yet to be addressed by quantitative comparative studies. We conducted an in vivo mass-spectrometry study of the phosphoproteomes of three yeast species (Saccharomyces cerevisiae, Candida albicans, and Schizosaccharomyces pombe) in order to quantify the evolutionary rate of change of phosphorylation. We estimate that kinase–substrate interactions change, at most, two orders of magnitude more slowly than transcription factor (TF)–promoter interactions. Our computational analysis linking kinases to putative substrates recapitulates known phosphoregulation events and provides putative evolutionary histories for the kinase regulation of protein complexes across 11 yeast species. To validate these trends, we used the E-MAP approach to analyze over 2,000 quantitative genetic interactions in S. cerevisiae and Sc. pombe, which demonstrated that protein kinases, and to a greater extent TFs, show lower than average conservation of genetic interactions. We propose therefore that protein kinases are an important source of phenotypic diversity.
Author Summary
Natural selection at a population level requires phenotypic diversity, which at the molecular level arises by mutation of the genome of each individual. What kinds of changes at the level of the DNA are most important for the generation of phenotypic differences remains a fundamental question in evolutionary biology. One well-studied source of phenotypic diversity is mutation in gene regulatory regions that results in changes in gene expression, but what proportion of phenotypic diversity is due to such mutations is not entirely clear. We investigated the relative contribution to phenotypic diversity of mutations in protein-coding regions compared to mutations in gene regulatory sequences. Given the important regulatory role played by phosphorylation across biological systems, we focused on mutations in protein-coding regions that alter protein–protein interactions involved in the binding of kinases to their substrate proteins. We studied the evolution of this “phosphoregulation” by analyzing the in vivo complement of phosphorylated proteins (the “phosphoproteome”) in three highly diverged yeast species—the budding yeast Saccharomyces cerevisiae, the pathogenic yeast Candida albicans, and the fission yeast Schizosaccharomyces pombe—and integrating those data with existing data on thousands of known genetic interactions from S. cerevisiae and Sc. pombe. We show that kinase–substrate interactions are altered at a rate that is at most two orders of magnitude slower than the alteration of transcription factor (TF)–promoter interactions, whereas TFs and kinases both show a faster than average rate of functional divergence estimated by the cross-species analysis of genetic interactions. Our data provide a quantitative estimate of the relative frequencies of different kinds of functionally relevant mutations and demonstrate that, like mutations in gene regulatory regions, mutations that result in changes in kinase–substrate interactions are an important source of phenotypic diversity.
doi:10.1371/journal.pbio.1000134
PMCID: PMC2691599  PMID: 19547744
3.  Whole-Genome Comparison Reveals Novel Genetic Elements That Characterize the Genome of Industrial Strains of Saccharomyces cerevisiae 
PLoS Genetics  2011;7(2):e1001287.
Human intervention has subjected the yeast Saccharomyces cerevisiae to multiple rounds of independent domestication and thousands of generations of artificial selection. As a result, this species comprises a genetically diverse collection of natural isolates as well as domesticated strains that are used in specific industrial applications. However the scope of genetic diversity that was captured during the domesticated evolution of the industrial representatives of this important organism remains to be determined. To begin to address this, we have produced whole-genome assemblies of six commercial strains of S. cerevisiae (four wine and two brewing strains). These represent the first genome assemblies produced from S. cerevisiae strains in their industrially-used forms and the first high-quality assemblies for S. cerevisiae strains used in brewing. By comparing these sequences to six existing high-coverage S. cerevisiae genome assemblies, clear signatures were found that defined each industrial class of yeast. This genetic variation was comprised of both single nucleotide polymorphisms and large-scale insertions and deletions, with the latter often being associated with ORF heterogeneity between strains. This included the discovery of more than twenty probable genes that had not been identified previously in the S. cerevisiae genome. Comparison of this large number of S. cerevisiae strains also enabled the characterization of a cluster of five ORFs that have integrated into the genomes of the wine and bioethanol strains on multiple occasions and at diverse genomic locations via what appears to involve the resolution of a circular DNA intermediate. This work suggests that, despite the scrutiny that has been directed at the yeast genome, there remains a significant reservoir of ORFs and novel modes of genetic transmission that may have significant phenotypic impact in this important model and industrial species.
Author Summary
The yeast S. cerevisiae has been associated with human activity for thousands of years in industries such as baking, brewing, and winemaking. During this time, humans have effectively domesticated this microorganism, with different industries selecting for specific desirable phenotypic traits. This has resulted in the species S. cerevisiae comprising a genetically diverse collection of individual strains that are often suited to very specific roles (e.g. wine strains produce wine but not beer and vice versa). In order to understand the genetic differences that underpin these diverse industrial characteristics, we have sequenced the genomes of six industrial strains of S. cerevisiae that comprise four strains used in commercial wine production and two strains used in beer brewing. By comparing these genome sequences to existing S. cerevisiae genome sequences from laboratory, pathogenic, bioethanol, and “natural” isolates, we were able to identify numerous genetic differences among these strains including the presence of novel open reading frames and genomic rearrangements, which may provide the basis for the phenotypic differences observed among these strains.
doi:10.1371/journal.pgen.1001287
PMCID: PMC3033381  PMID: 21304888
4.  Global Mapping of Transposon Location 
PLoS Genetics  2006;2(12):e212.
Transposable genetic elements are ubiquitous, yet their presence or absence at any given position within a genome can vary between individual cells, tissues, or strains. Transposable elements have profound impacts on host genomes by altering gene expression, assisting in genomic rearrangements, causing insertional mutations, and serving as sources of phenotypic variation. Characterizing a genome's full complement of transposons requires whole genome sequencing, precluding simple studies of the impact of transposition on interindividual variation. Here, we describe a global mapping approach for identifying transposon locations in any genome, using a combination of transposon-specific DNA extraction and microarray-based comparative hybridization analysis. We use this approach to map the repertoire of endogenous transposons in different laboratory strains of Saccharomyces cerevisiae and demonstrate that transposons are a source of extensive genomic variation. We also apply this method to mapping bacterial transposon insertion sites in a yeast genomic library. This unique whole genome view of transposon location will facilitate our exploration of transposon dynamics, as well as defining bases for individual differences and adaptive potential.
Synopsis
Transposons, or mobile DNA sequences—first described by Barbara McClintock—are interesting and important residents of all genomes. They are involved in gene creation and regulation, chromosome evolution, and generation of mutations, events that can occur on hugely varying time scales, from millions of years to mere days in the lab. Some transposons have even been “tamed” by geneticists for use as tools for marking genes and making mutations. In yeast, genome sequencing has given us a snapshot of transposons present in one strain at one particular time. The authors developed a method to easily, accurately, and globally track transposons in order to study how their locations change in different strains or during an experiment. The method involves finding pieces of DNA that contain the ends of transposons along with neighboring DNA and attaching these segments to magnetic beads. A magnet is then used to separate the selected DNAs away from the rest of the genome. The transposon-associated DNA is labeled with dyes and applied to a microarray, a glass slide with over 40,000 unique sequence features of yeast DNA attached. Each feature that lights up with the dye marks a transposon location. This new technique allows investigators to easily identify specific strains, to accurately monitor mobile portions of the genome, and to determine the role of transposons in phenotypic differences.
doi:10.1371/journal.pgen.0020212
PMCID: PMC1698948  PMID: 17173485
5.  Telomere Length as a Quantitative Trait: Genome-Wide Survey and Genetic Mapping of Telomere Length-Control Genes in Yeast 
PLoS Genetics  2006;2(3):e35.
Telomere length-variation in deletion strains of Saccharomyces cerevisiae was used to identify genes and pathways that regulate telomere length. We found 72 genes that when deleted confer short telomeres, and 80 genes that confer long telomeres relative to those of wild-type yeast. Among identified genes, 88 have not been previously implicated in telomere length control. Genes that regulate telomere length span a variety of functions that can be broadly separated into telomerase-dependent and telomerase-independent pathways. We also found 39 genes that have an important role in telomere maintenance or cell proliferation in the absence of telomerase, including genes that participate in deoxyribonucleotide biosynthesis, sister chromatid cohesion, and vacuolar protein sorting. Given the large number of loci identified, we investigated telomere lengths in 13 wild yeast strains and found substantial natural variation in telomere length among the isolates. Furthermore, we crossed a wild isolate to a laboratory strain and analyzed telomere length in 122 progeny. Genome-wide linkage analysis among these segregants revealed two loci that account for 30%–35% of telomere length-variation between the strains. These findings support a general model of telomere length-variation in outbred populations that results from polymorphisms at a large number of loci. Furthermore, our results laid the foundation for studying genetic determinants of telomere length-variation and their roles in human disease.
Synopsis
Telomere maintenance is of great importance to ensure genome stability in organisms with linear genomes. In humans, telomeres shorten as a function of age and serve as a marker of cell replication history. Understanding the genetic differences in telomere length-maintenance may help provide the insights into the basis for different rates of aging among individuals and differences in individuals' propensity for aging-associated diseases such as cancer. Studies in yeast and other model organisms have defined several pathways that ensure stability of chromosome ends. In order to capture full complement of genes that participate in telomere maintenance in yeast Saccharomyces cerevisiae, the authors undertook a comprehensive screen for genes that affect telomere length. Among 152 identified genes, the authors found 39 genes whose function is critical for telomere maintenance in the absence of telomerase. The authors extended their studies from laboratory yeast strains to outbred populations of yeast and discovered significant phenotypic variation in telomere length among the isolates. Telomere length-analysis of a cross between a wild yeast isolate and a laboratory strain support a general model of telomere length-variation in outbred populations that results from polymorphisms at a large number of loci. This finding provides a basis for genetic studies of telomere maintenance in human populations.
doi:10.1371/journal.pgen.0020035
PMCID: PMC1401499  PMID: 16552446
6.  Local Regulatory Variation in Saccharomyces cerevisiae 
PLoS Genetics  2005;1(2):e25.
Naturally occurring sequence variation that affects gene expression is an important source of phenotypic differences among individuals within a species. We and others have previously shown that such regulatory variation can occur both at the same locus as the gene whose expression it affects (local regulatory variation) and elsewhere in the genome at trans-acting factors. Here we present a detailed analysis of genome-wide local regulatory variation in Saccharomyces cerevisiae. We used genetic linkage analysis to show that nearly a quarter of all yeast genes contain local regulatory variation between two divergent strains. We measured allele-specific expression in a diploid hybrid of the two strains for 77 genes showing strong self-linkage and found that in 52%–78% of these genes, local regulatory variation acts directly in cis. We also experimentally confirmed one example in which local regulatory variation in the gene AMN1 acts in trans through a feedback loop. Genome-wide sequence analysis revealed that genes subject to local regulatory variation show increased polymorphism in the promoter regions, and that some but not all of this increase is due to polymorphisms in predicted transcription factor binding sites. Increased polymorphism was also found in the 3′ untranslated regions of these genes. These findings point to the importance of cis-acting variation, but also suggest that there is a diverse set of mechanisms through which local variation can affect gene expression levels.
Synopsis
Variation in DNA sequences in and around a gene can contribute to differences between individuals by affecting the gene's expression. The authors have used a variety of methods to characterize this local DNA sequence variation on a large scale in two strains of the budding yeast Saccharomyces cerevisiae. Their results suggest that the expression levels of a sizeable fraction of genes are affected by local sequence variation. Many local variants alter the expression of only one of two copies of a gene in diploid hybrid yeast, but other local variants can affect both copies equally. The authors also found that sequence variation in particular regions of DNA near genes, both upstream and downstream of coding sequences and especially in transcription factor binding sites, is most likely to affect gene expression. These results provide a detailed view of local sequence variation that affects the expression of nearby genes in S. cerevisiae.
doi:10.1371/journal.pgen.0010025
PMCID: PMC1189075  PMID: 16121257
7.  A synthetic library of RNA control modules for predictable tuning of gene expression in yeast 
The authors describe a library of synthetic RNA control elements that provide programmable post-transcriptional regulation of gene expression in yeast. This toolkit is then used to study endogenous regulation of the ergosterol biosynthetic pathway.
Rnt1p hairpins can act as effective posttranscriptional gene regulatory elements in the yeast Saccharomyces cerevisiae.Modification of the cleavage efficiency box (CEB) region of an Rnt1p hairpin can modulate Rnt1p cleavage rates, and thus the resulting gene regulatory activities of the hairpin control elements.A library of Rnt1p hairpins can act as a set of synthetic control modules that provide predictable tuning of gene expression over a wide range of expression levels.The Rnt1p-based control elements can be combined with any promoter to support titration of regulatory strategies encoded in transcriptional regulators, including feedback control around endogenous proteins.
The design of complex biological systems encoding desired functions require the development of genetic tools for the precise control of protein levels in cells (Elowitz and Leibler, 2000; Gardner et al, 2000; Basu et al, 2004). For example, in the design of engineered metabolic networks, the tuning of enzyme levels is often critical for overcoming metabolic burden (Jones et al, 2000; Jin et al, 2003), the accumulation of toxic intermediates (Zhu et al, 2001; Pfleger et al, 2006) and detrimental consequences associated with the redirection of cellular resources from native pathways (Alper et al, 2005b; Paradise et al, 2008). Various examples of libraries of genetic control modules have been described that have been generated through the randomization of well-characterized gene expression control elements (Basu et al, 2004; Pfleger et al, 2006; Anderson et al, 2007). However, most of these studies have been conducted in Escherichia coli such that there is a lack of similar tools for other cellular chassis.
The budding yeast, Saccharomyces cerevisiae, is a relevant organism in industrial processes, including biosynthesis and biomanufacturing strategies (Ostergaard et al, 2000; Szczebara et al, 2003; Nguyen et al, 2004; Veen and Lang, 2004; Ro et al, 2006; Hawkins and Smolke, 2008). The majority of existing methods for tuning gene expression in yeast are through transcriptional control mechanisms in the form of inducible and constitutive promoter systems (Hawkins and Smolke, 2006; Nevoigt et al, 2006; Nevoigt et al, 2007). RNA-based control modules based on posttranscriptional mechanisms may offer an advantage in that they can be coupled to any promoter of choice, providing for enhanced control strategies and finer resolution tuning of protein expression levels. Although posttranscriptional control elements, such as internal ribosome entry sites and AU-rich elements, have been applied to regulate heterologous gene expression in yeast (Vasudevan and Peltz, 2001; Zhou et al, 2001; Lautz et al, 2010), these control elements have exhibited substantial variability in activity and have not been engineered as synthetic libraries exhibiting a wide range of predictable gene regulatory activities.
RNase III enzymes are a class of enzymes that cleave double-stranded RNA. The S. cerevisiae RNase III enzyme, Rnt1p, exhibits a number of unique features that allow it to recognize very specific RNA hairpin substrates that harbor a consensus AGNN tetraloop sequence. Despite extensive characterization of this enzyme and its demonstrated role in processing non-coding RNA and mRNA, neither natural nor synthetic Rnt1p substrates have been used to control gene expression levels in yeast. Therefore, we developed a genetic control system based on directed Rnt1p processing of a target transcript. Specifically, Rnt1p hairpins were immediately flanked by a clamp sequence (that insulates the hairpin structure from surrounding sequences) and placed downstream of a gene of interest, where they direct cleavage and thus inactivate the transcript, resulting in rapid transcript degradation. We validated this Rnt1p-based control system with two Rnt1p hairpins based on previous in vitro studies and demonstrated that Rnt1p hairpins can act as gene control modules in yeast.
Previous in vitro studies had identified three key regions in Rnt1p hairpins: the cleavage efficiency box (CEB), the binding stability box and the initial binding and positioning box (Lamontagne et al, 2003). The CEB region affects the processing of the hairpin stem by Rnt1p, such that nucleotide (nt) modifications in this region are expected to specifically modulate the cleavage rate. We created an Rnt1p hairpin library by randomizing the CEB region (12 nt). This library was placed downstream of a fluorescent reporter protein and a cell-based screening assay was used to identify functional members of the library that resulted in lowered fluorescence levels. The functional Rnt1p hairpin library comprises 16 unique sequences that span a large gene regulatory range—from 8 to 85% (Figure 3A)—and are fairly evenly distributed across this range. The negative controls for each sequence (constructed by mutating the required consensus tetraloop sequence) demonstrated that the majority of gene knockdown observed from each hairpin is due to Rnt1p processing (Figure 3B). A correlation analysis on the transcript and protein levels for each library hairpin construct indicated a strong positive correlation and a strong preservation of rank order between the two in vivo regulatory measurements (Figure 3C). Characterization of the hairpin library in a different genetic context supported the broader utility of these control modules for providing predictable gene control.
We applied the Rnt1p control modules to titrating a key enzyme component of the endogenous ergosterol biosynthesis network—the ERG9 genetic target. Squalene synthase, encoded by the ERG9 gene, is responsible for catalyzing the conversion of two molecules of farnesyl pyrophosphate to squalene, the first precursor in the ergosterol biosynthetic pathway in S. cerevisiae (Poulter and Rilling, 1981; Figure 6A). We integrated several members of the Rnt1p hairpin library downstream of the native ERG9 gene to cover the regulatory range of the library (Figure 6B). A strong positive correlation and preservation of rank order was observed between the ERG9 transcript levels and their yEGFP3 counterparts (Figure 6C). However, ERG9 expression levels did not fall below ∼40%, regardless of the Rnt1p hairpin strength, indicating that a previously identified endogenous feedback mechanism associated with the native ERG9 promoter acts to maintain ERG9 expression levels at that threshold value. In addition, most strains exhibited high relative ergosterol levels and growth rates, except for two strains harboring synthetic Rnt1p hairpins that resulted in the lowest expression levels, which exhibited a significant reduction in the amount of ergosterol produced and growth rate (Figure 6D and E). Our studies indicate that the endogenous feedback mechanism can be acting to increase ERG9 expression levels to the desired set point in the slow-growing strains, but the perturbations introduced in these strains may result in other impacts on the pathway that inhibit the endogenous control systems from restoring cellular growth to wild-type rates. These studies support the unique ability of the synthetic Rnt1p hairpin library to systematically titrate pathway enzyme levels by introducing precise perturbations around major control points while maintaining native cellular control strategies acting through transcriptional mechanisms.
Advances in synthetic biology have resulted in the development of genetic tools that support the design of complex biological systems encoding desired functions. The majority of efforts have focused on the development of regulatory tools in bacteria, whereas fewer tools exist for the tuning of expression levels in eukaryotic organisms. Here, we describe a novel class of RNA-based control modules that provide predictable tuning of expression levels in the yeast Saccharomyces cerevisiae. A library of synthetic control modules that act through posttranscriptional RNase cleavage mechanisms was generated through an in vivo screen, in which structural engineering methods were applied to enhance the insulation and modularity of the resulting components. This new class of control elements can be combined with any promoter to support titration of regulatory strategies encoded in transcriptional regulators and thus more sophisticated control schemes. We applied these synthetic controllers to the systematic titration of flux through the ergosterol biosynthesis pathway, providing insight into endogenous control strategies and highlighting the utility of this control module library for manipulating and probing biological systems.
doi:10.1038/msb.2011.4
PMCID: PMC3094065  PMID: 21364573
gene expression control; metabolic flux control; RNA controller; Rnt1p hairpin; synthetic biology
8.  Whole-genome sequencing of a laboratory-evolved yeast strain 
BMC Genomics  2010;11:88.
Background
Experimental evolution of microbial populations provides a unique opportunity to study evolutionary adaptation in response to controlled selective pressures. However, until recently it has been difficult to identify the precise genetic changes underlying adaptation at a genome-wide scale. New DNA sequencing technologies now allow the genome of parental and evolved strains of microorganisms to be rapidly determined.
Results
We sequenced >93.5% of the genome of a laboratory-evolved strain of the yeast Saccharomyces cerevisiae and its ancestor at >28× depth. Both single nucleotide polymorphisms and copy number amplifications were found, with specific gains over array-based methodologies previously used to analyze these genomes. Applying a segmentation algorithm to quantify structural changes, we determined the approximate genomic boundaries of a 5× gene amplification. These boundaries guided the recovery of breakpoint sequences, which provide insights into the nature of a complex genomic rearrangement.
Conclusions
This study suggests that whole-genome sequencing can provide a rapid approach to uncover the genetic basis of evolutionary adaptations, with further applications in the study of laboratory selections and mutagenesis screens. In addition, we show how single-end, short read sequencing data can provide detailed information about structural rearrangements, and generate predictions about the genomic features and processes that underlie genome plasticity.
doi:10.1186/1471-2164-11-88
PMCID: PMC2829512  PMID: 20128923
9.  Genome sequencing and genetic breeding of a bioethanol Saccharomyces cerevisiae strain YJS329 
BMC Genomics  2012;13:479.
Background
Environmental stresses and inhibitors encountered by Saccharomyces cerevisiae strains are the main limiting factors in bioethanol fermentation. Strains with different genetic backgrounds usually show diverse stress tolerance responses. An understanding of the mechanisms underlying these phenotypic diversities within S. cerevisiae populations could guide the construction of strains with desired traits.
Results
We explored the genetic characteristics of the bioethanol S. cerevisiae strain YJS329 and elucidated how genetic variations in its genome were correlated with specified traits compared to similar traits in the S288c-derived strain, BYZ1. Karyotypic electrophoresis combined with array-comparative genomic hybridization indicated that YJS329 was a diploid strain with a relatively constant genome as a result of the fewer Ty elements and lack of structural polymorphisms between homologous chromosomes that it contained. By comparing the sequence with the S288c genome, a total of 64,998 SNPs, 7,093 indels and 11 unique genes were identified in the genome of YJS329-derived haploid strain YJSH1 through whole-genome sequencing. Transcription comparison using RNA-Seq identified which of the differentially expressed genes were the main contributors to the phenotypic differences between YJS329 and BYZ1. By combining the results obtained from the genome sequences and the transcriptions, we predicted how the SNPs, indels and chromosomal copy number variations may affect the mRNA expression profiles and phenotypes of the yeast strains. Furthermore, some genetic breeding strategies to improve the adaptabilities of YJS329 were designed and experimentally verified.
Conclusions
Through comparative functional genomic analysis, we have provided some insights into the mechanisms underlying the specific traits of the bioenthanol strain YJS329. The work reported here has not only enriched the available genetic resources of yeast but has also indicated how functional genomic studies can be used to improve genetic breeding in yeast.
doi:10.1186/1471-2164-13-479
PMCID: PMC3484046  PMID: 22978491
Bioethanol; Saccharomyces cerevisiae; Stress; Genome; RNA-Seq
10.  3′ Untranslated Regions Mediate Transcriptional Interference between Convergent Genes Both Locally and Ectopically in Saccharomyces cerevisiae 
PLoS Genetics  2014;10(1):e1004021.
Paired sense and antisense (S/AS) genes located in cis represent a structural feature common to the genomes of both prokaryotes and eukaryotes, and produce partially complementary transcripts. We used published genome and transcriptome sequence data and found that over 20% of genes (645 pairs) in the budding yeast Saccharomyces cerevisiae genome are arranged in convergent pairs with overlapping 3′-UTRs. Using published microarray transcriptome data from the standard laboratory strain of S. cerevisiae, our analysis revealed that expression levels of convergent pairs are significantly negatively correlated across a broad range of environments. This implies an important role for convergent genes in the regulation of gene expression, which may compensate for the absence of RNA-dependent mechanisms such as micro RNAs in budding yeast. We selected four representative convergent gene pairs and used expression assays in wild type yeast and its genetically modified strains to explore the underlying patterns of gene expression. Results showed that convergent genes are reciprocally regulated in yeast populations and in single cells, whereby an increase in expression of one gene produces a decrease in the expression of the other, and vice-versa. Time course analysis of the cell cycle illustrated the functional significance of this relationship for the three pairs with relevant functional roles. Furthermore, a series of genetic modifications revealed that the 3′-UTR sequence plays an essential causal role in mediating transcriptional interference, which requires neither the sequence of the open reading frame nor the translation of fully functional proteins. More importantly, transcriptional interference persisted even when one of the convergent genes was expressed ectopically (in trans) and therefore does not depend on the cis arrangement of convergent genes; we conclude that the mechanism of transcriptional interference cannot be explained by the transcriptional collision model, which postulates a clash between simultaneous transcriptional processes occurring on opposite DNA strands.
Author Summary
In the compact genome of the budding yeast Saccharomyces cerevisiae, genes are frequently organized into convergent pairs that are transcribed from opposing DNA strands in opposite directions and have overlapping 3′-UTRs. Here we explore the negative correlation in expression levels between convergent genes using a set of 645 convergent pairs in the yeast genome, identified from published genomic and transcriptomic sequence data and accounting for ∼20% of total yeast genes. Analysis of published microarray experiments confirmed that the negative correlation in expression between convergent genes occurs across a broad range of growth conditions. This implies that such transcriptional interference is an important means of regulating gene expression in yeast, especially in the absence of other eukaryotic RNA-dependent mechanisms such as micro RNAs. We focused on profiling the expression of four pairs of convergent genes in wild type yeast and its genetically modified strains, to explore the causes and mechanisms of transcriptional interference. We demonstrate that the 3′-UTR sequence alone plays the essential and causal role in interference between convergent genes. Intriguingly, transcriptional interference occurs even when one of the convergent genes is expressed from elsewhere in the genome (in trans), raising new questions about how transcriptional interference operates.
doi:10.1371/journal.pgen.1004021
PMCID: PMC3900390  PMID: 24465217
11.  Bulk Segregant Analysis by High-Throughput Sequencing Reveals a Novel Xylose Utilization Gene from Saccharomyces cerevisiae 
PLoS Genetics  2010;6(5):e1000942.
Fermentation of xylose is a fundamental requirement for the efficient production of ethanol from lignocellulosic biomass sources. Although they aggressively ferment hexoses, it has long been thought that native Saccharomyces cerevisiae strains cannot grow fermentatively or non-fermentatively on xylose. Population surveys have uncovered a few naturally occurring strains that are weakly xylose-positive, and some S. cerevisiae have been genetically engineered to ferment xylose, but no strain, either natural or engineered, has yet been reported to ferment xylose as efficiently as glucose. Here, we used a medium-throughput screen to identify Saccharomyces strains that can increase in optical density when xylose is presented as the sole carbon source. We identified 38 strains that have this xylose utilization phenotype, including strains of S. cerevisiae, other sensu stricto members, and hybrids between them. All the S. cerevisiae xylose-utilizing strains we identified are wine yeasts, and for those that could produce meiotic progeny, the xylose phenotype segregates as a single gene trait. We mapped this gene by Bulk Segregant Analysis (BSA) using tiling microarrays and high-throughput sequencing. The gene is a putative xylitol dehydrogenase, which we name XDH1, and is located in the subtelomeric region of the right end of chromosome XV in a region not present in the S288c reference genome. We further characterized the xylose phenotype by performing gene expression microarrays and by genetically dissecting the endogenous Saccharomyces xylose pathway. We have demonstrated that natural S. cerevisiae yeasts are capable of utilizing xylose as the sole carbon source, characterized the genetic basis for this trait as well as the endogenous xylose utilization pathway, and demonstrated the feasibility of BSA using high-throughput sequencing.
Author Summary
Ethanol made from fermentation of lignocellulosic biomass by baker's yeast can be considered “carbon neutral” and is one alternative to fossil fuels for powering vehicles. One of the recognized requirements for cost-effective and energy-efficient cellulosic ethanol production is the need to convert the sugar xylose—a major component of cellulosic biomass—into ethanol; however, it has traditionally been thought that baker's yeast cannot ferment xylose. We sought to investigate this assumption by looking at close relatives of baker's yeast from around the world to see if any had an intrinsic ability to grow on xylose. We identified a number of yeasts, many of them used in winemaking, that grow very slowly on this sugar, and studied one in detail. We determined that in this particular yeast the ability to grow on xylose is due to the presence of a single gene, which we named XDH1. This gene is not present in the typical laboratory strains of baker's yeast, but appears to be very common in natural wine yeasts. This gene could be useful in continuing efforts to make yeasts that can efficiently ferment xylose to ethanol.
doi:10.1371/journal.pgen.1000942
PMCID: PMC2869308  PMID: 20485559
12.  Cross-species discovery of syncretic drug combinations that potentiate the antifungal fluconazole 
The authors screen for compounds that show synergistic antifungal activity when combined with the widely-used fungistatic drug fluconazole. Chemogenomic profiling explains the mode of action of synergistic drugs and allows the prediction of additional drug synergies.
The authors screen for compounds that show synergistic antifungal activity when combined with the widely-used fungistatic drug fluconazole. Chemogenomic profiling explains the mode of action of synergistic drugs and allows the prediction of additional drug synergies.
Chemical screens with a library enriched for known drugs identified a diverse set of 148 compounds that potentiated the action of the antifungal drug fluconazole against the fungal pathogens Cryptococcus neoformans, Cryptococcus gattii and Candida albicans, and the model yeast Saccharomyces cerevisiae, often in a species-specific manner.Chemogenomic profiles of six confirmed hits in S. cerevisiae revealed different modes of action and enabled the prediction of additional synergistic combinations; three-way synergistic interactions exhibited even stronger synergies at low doses of fluconazole.The synergistic combination of fluconazole and the antidepressant sertraline was active against fluconazole-resistant clinical fungal isolates and in an in vivo model of Cryptococcal infection.
Rising fungal infection rates, especially among immune-suppressed individuals, represent a serious clinical challenge (Gullo, 2009). Cancer, organ transplant and HIV patients, for example, often succumb to opportunistic fungal pathogens. The limited repertoire of approved antifungal agents and emerging drug resistance in the clinic further complicate the effective treatment of systemic fungal infections. At the molecular level, the paucity of fungal-specific essential targets arises from the conserved nature of cellular functions from yeast to humans, as well as from the fact that many essential yeast genes can confer viability at a fraction of wild-type dosage (Yan et al, 2009). Although only ∼1100 of the ∼6000 genes in yeast are essential, almost all genes become essential in specific genetic backgrounds in which another non-essential gene has been deleted or otherwise attenuated, an effect termed synthetic lethality (Tong et al, 2001). Genome-scale surveys suggest that over 200 000 binary synthetic lethal gene combinations dominate the yeast genetic landscape (Costanzo et al, 2010). The genetic buffering phenomenon is also manifest as a plethora of differential chemical–genetic interactions in the presence of sublethal doses of bioactive compounds (Hillenmeyer et al, 2008). These observations frame the difficulty of interdicting network functions in eukaryotic pathogens with single agent therapeutics. At the same time, however, this genetic network organization suggests that judicious combinations of small molecule inhibitors of both essential and non-essential targets may elicit additive or synergistic effects on cell growth (Sharom et al, 2004; Lehar et al, 2008). Unbiased screens for drugs that synergistically enhance a specific bioactive effect, but which are not themselves individually active—termed a syncretic combination—are one means to substantially elaborate chemical space (Keith et al, 2005). Indeed, compounds that enhance the activity of known agents in model yeast and cancer cell line systems have been identified both by focused small molecule library screens and by computational methods (Borisy et al, 2003; Lehar et al, 2007; Nelander et al, 2008; Jansen et al, 2009; Zinner et al, 2009).
To extend the stratagem of chemical synthetic lethality to clinically relevant fungal pathogens, we screened a bioactive library of known drugs for synergistic enhancers of the widely used fungistatic drug fluconazole against the clinically relevant pathogens C. albicans, C. neoformans and C. gattii, as well as the genetically tractable budding yeast S. cerevisiae. Fluconazole is an azole drug that inhibits lanosterol 14α-demethylase, the gene product of ERG11, an essential cytochrome P450 enzyme in the ergosterol biosynthetic pathway (Groll et al, 1998). We identified 148 drugs that potentiate the antifungal action of fluconazole against the four species. These syncretic compounds had not been previously recognized in the clinic as antifungal agents, and many acted in a species-specific manner, often in a potent fungicidal manner.
To understand the mechanisms of synergism, we interrogated six syncretic drugs—trifluoperazine, tamoxifen, clomiphene, sertraline, suloctidil and L-cycloserine—in genome-wide chemogenomic profiles of the S. cerevisiae deletion strain collection (Giaever et al, 1999). These profiles revealed that membrane, vesicle trafficking and lipid biosynthesis pathways are targeted by five of the synergizers, whereas the sphingolipid biosynthesis pathway is targeted by L-cycloserine. Cell biological assays confirmed the predicted membrane disruption effects of the former group of compounds, which may perturb ergosterol metabolism, impair fluconazole export by drug efflux pumps and/or affect active import of fluconazole (Kuo et al, 2010; Mansfield et al, 2010). Based on the integration of chemical–genetic and genetic interaction space, a signature set of deletion strains that are sensitive to the membrane active synergizers correctly predicted additional drug synergies with fluconazole. Similarly, the L-cycloserine chemogenomic profile correctly predicted a synergistic interaction between fluconazole and myriocin, another inhibitor of sphingolipid biosynthesis. The structure of genetic networks suggests that it should be possible to devise higher order drug combinations with even greater selectivity and potency (Sharom et al, 2004). In an initial test of this concept, we found that the combination of a non-synergistic pair drawn from the membrane active and sphingolipid target classes exhibited potent three-way synergism with a low dose of fluconazole. Finally, the combination of sertraline and fluconazole was active in a G. mellonella model of Cryptococcal infection, and was also efficacious against fluconazole-resistant clinical isolates of C. albicans and C. glabrata.
Collectively, these results demonstrate that the combinatorial redeployment of known drugs defines a powerful antifungal strategy and establish a number of potential lead combinations for future clinical assessment.
Resistance to widely used fungistatic drugs, particularly to the ergosterol biosynthesis inhibitor fluconazole, threatens millions of immunocompromised patients susceptible to invasive fungal infections. The dense network structure of synthetic lethal genetic interactions in yeast suggests that combinatorial network inhibition may afford increased drug efficacy and specificity. We carried out systematic screens with a bioactive library enriched for off-patent drugs to identify compounds that potentiate fluconazole action in pathogenic Candida and Cryptococcus strains and the model yeast Saccharomyces. Many compounds exhibited species- or genus-specific synergism, and often improved fluconazole from fungistatic to fungicidal activity. Mode of action studies revealed two classes of synergistic compound, which either perturbed membrane permeability or inhibited sphingolipid biosynthesis. Synergistic drug interactions were rationalized by global genetic interaction networks and, notably, higher order drug combinations further potentiated the activity of fluconazole. Synergistic combinations were active against fluconazole-resistant clinical isolates and an in vivo model of Cryptococcus infection. The systematic repurposing of approved drugs against a spectrum of pathogens thus identifies network vulnerabilities that may be exploited to increase the activity and repertoire of antifungal agents.
doi:10.1038/msb.2011.31
PMCID: PMC3159983  PMID: 21694716
antifungal; combination; pathogen; resistance; synergism
13.  Genetic Complexity and Quantitative Trait Loci Mapping of Yeast Morphological Traits  
PLoS Genetics  2007;3(2):e31.
Functional genomics relies on two essential parameters: the sensitivity of phenotypic measures and the power to detect genomic perturbations that cause phenotypic variations. In model organisms, two types of perturbations are widely used. Artificial mutations can be introduced in virtually any gene and allow the systematic analysis of gene function via mutants fitness. Alternatively, natural genetic variations can be associated to particular phenotypes via genetic mapping. However, the access to genome manipulation and breeding provided by model organisms is sometimes counterbalanced by phenotyping limitations. Here we investigated the natural genetic diversity of Saccharomyces cerevisiae cellular morphology using a very sensitive high-throughput imaging platform. We quantified 501 morphological parameters in over 50,000 yeast cells from a cross between two wild-type divergent backgrounds. Extensive morphological differences were found between these backgrounds. The genetic architecture of the traits was complex, with evidence of both epistasis and transgressive segregation. We mapped quantitative trait loci (QTL) for 67 traits and discovered 364 correlations between traits segregation and inheritance of gene expression levels. We validated one QTL by the replacement of a single base in the genome. This study illustrates the natural diversity and complexity of cellular traits among natural yeast strains and provides an ideal framework for a genetical genomics dissection of multiple traits. Our results did not overlap with results previously obtained from systematic deletion strains, showing that both approaches are necessary for the functional exploration of genomes.
Author Summary
A familiar face or a dog breed is easily recognized because morphology of individuals differs according to their genetic backgrounds. For single-cell organisms, morphology reduces to the shape and size of cellular features. Microbiologists noticed that the shape of S. cerevisiae cells (baker's yeast) differs from one strain to another, but these differences were usually described qualitatively. We used a high-throughput imaging platform to study the morphology of yeast cells when they divide. Cells were stained with three fluorescent dyes so that their periphery, their DNA, and their actin could be recognized, and their images were analysed by a specialized software program. Numerous morphological differences were found between two distant strains of S. cerevisiae. By crossing these two strains, we performed quantitative genetics: several loci controlling morphological variations were found on the genome, and correlations were made between gene expression and morphology changes. Using bioinformatics, we showed that the results obtained do not overlap with previous results obtained from yeast cells in which specific genes are deleted. The study, therefore, illustrates how mutagenesis and the use of natural genetic variations provide complementary knowledge.
doi:10.1371/journal.pgen.0030031
PMCID: PMC1802830  PMID: 17319748
14.  Mining for genotype-phenotype relations in Saccharomyces using partial least squares 
BMC Bioinformatics  2011;12:318.
Background
Multivariate approaches are important due to their versatility and applications in many fields as it provides decisive advantages over univariate analysis in many ways. Genome wide association studies are rapidly emerging, but approaches in hand pay less attention to multivariate relation between genotype and phenotype. We introduce a methodology based on a BLAST approach for extracting information from genomic sequences and Soft- Thresholding Partial Least Squares (ST-PLS) for mapping genotype-phenotype relations.
Results
Applying this methodology to an extensive data set for the model yeast Saccharomyces cerevisiae, we found that the relationship between genotype-phenotype involves surprisingly few genes in the sense that an overwhelmingly large fraction of the phenotypic variation can be explained by variation in less than 1% of the full gene reference set containing 5791 genes. These phenotype influencing genes were evolving 20% faster than non-influential genes and were unevenly distributed over cellular functions, with strong enrichments in functions such as cellular respiration and transposition. These genes were also enriched with known paralogs, stop codon variations and copy number variations, suggesting that such molecular adjustments have had a disproportionate influence on Saccharomyces yeasts recent adaptation to environmental changes in its ecological niche.
Conclusions
BLAST and PLS based multivariate approach derived results that adhere to the known yeast phylogeny and gene ontology and thus verify that the methodology extracts a set of fast evolving genes that capture the phylogeny of the yeast strains. The approach is worth pursuing, and future investigations should be made to improve the computations of genotype signals as well as variable selection procedure within the PLS framework.
doi:10.1186/1471-2105-12-318
PMCID: PMC3175482  PMID: 21812956
15.  Ribosomal DNA Sequence Heterogeneity Reflects Intraspecies Phylogenies and Predicts Genome Structure in Two Contrasting Yeast Species 
Systematic Biology  2014;63(4):543-554.
The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of closely related organisms, and discuss how it could be extended to future studies of multilocus rDNA systems. [concerted evolution; genome hydridisation; phylogenetic analysis; ribosomal DNA; whole genome sequencing; yeast]
doi:10.1093/sysbio/syu019
PMCID: PMC4055870  PMID: 24682414
16.  Non-Coding RNA Prediction and Verification in Saccharomyces cerevisiae 
PLoS Genetics  2009;5(1):e1000321.
Non-coding RNA (ncRNA) play an important and varied role in cellular function. A significant amount of research has been devoted to computational prediction of these genes from genomic sequence, but the ability to do so has remained elusive due to a lack of apparent genomic features. In this work, thermodynamic stability of ncRNA structural elements, as summarized in a Z-score, is used to predict ncRNA in the yeast Saccharomyces cerevisiae. This analysis was coupled with comparative genomics to search for ncRNA genes on chromosome six of S. cerevisiae and S. bayanus. Sets of positive and negative control genes were evaluated to determine the efficacy of thermodynamic stability for discriminating ncRNA from background sequence. The effect of window sizes and step sizes on the sensitivity of ncRNA identification was also explored. Non-coding RNA gene candidates, common to both S. cerevisiae and S. bayanus, were verified using northern blot analysis, rapid amplification of cDNA ends (RACE), and publicly available cDNA library data. Four ncRNA transcripts are well supported by experimental data (RUF10, RUF11, RUF12, RUF13), while one additional putative ncRNA transcript is well supported but the data are not entirely conclusive. Six candidates appear to be structural elements in 5′ or 3′ untranslated regions of annotated protein-coding genes. This work shows that thermodynamic stability, coupled with comparative genomics, can be used to predict ncRNA with significant structural elements.
Author Summary
Recent advances in DNA sequence technology have made it possible to sequence entire genomes. Once a genome is sequenced, it becomes necessary to identify the set of genes and other functional elements within the genome. This is particularly challenging as much of the genomic sequence does not appear to perform any function and is loosely referred to as “junk.” Identifying functional elements among the “junk” is difficult. Experimental methods have been developed for this purpose but they are time-consuming, expensive, and often provide an incomplete picture. Thus, it is important to develop the ability to identify these functional elements using computational methods. Protein-coding genes are relatively easy to identify computationally, but other categories of functional elements present a significantly greater challenge. In this work, we used a computational approach to identify genes that do not encode for a protein but rather function as an RNA molecule. We then used experimental methods to verify our predictions and thereby validate the computational method.
doi:10.1371/journal.pgen.1000321
PMCID: PMC2603021  PMID: 19119416
17.  A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach 
BMC Bioinformatics  2012;13:237.
Background
Yeasts are a model system for exploring eukaryotic genome evolution. Next-generation sequencing technologies are poised to vastly increase the number of yeast genome sequences, both from resequencing projects (population studies) and from de novo sequencing projects (new species). However, the annotation of genomes presents a major bottleneck for de novo projects, because it still relies on a process that is largely manual.
Results
Here we present the Yeast Genome Annotation Pipeline (YGAP), an automated system designed specifically for new yeast genome sequences lacking transcriptome data. YGAP does automatic de novo annotation, exploiting homology and synteny information from other yeast species stored in the Yeast Gene Order Browser (YGOB) database. The basic premises underlying YGAP's approach are that data from other species already tells us what genes we should expect to find in any particular genomic region and that we should also expect that orthologous genes are likely to have similar intron/exon structures. Additionally, it is able to detect probable frameshift sequencing errors and can propose corrections for them. YGAP searches intelligently for introns, and detects tRNA genes and Ty-like elements.
Conclusions
In tests on Saccharomyces cerevisiae and on the genomes of Naumovozyma castellii and Tetrapisispora blattae newly sequenced with Roche-454 technology, YGAP outperformed another popular annotation program (AUGUSTUS). For S. cerevisiae and N. castellii, 91-93% of YGAP's predicted gene structures were identical to those in previous manually curated gene sets. YGAP has been implemented as a webserver with a user-friendly interface at http://wolfe.gen.tcd.ie/annotation.
doi:10.1186/1471-2105-13-237
PMCID: PMC3507789  PMID: 22984983
Annotation; Saccharomyces; Comparative genomics
18.  The essential genome of a bacterium 
This study reports the essential Caulobacter genome at 8 bp resolution determined by saturated transposon mutagenesis and high-throughput sequencing. This strategy is applicable to full genome essentiality studies in a broad class of bacterial species.
The essential Caulobacter genome was determined at 8 bp resolution using hyper-saturated transposon mutagenesis coupled with high-throughput sequencing.Essential protein-coding sequences comprise 90% of the essential genome; the remaining 10% comprising essential non-coding RNA sequences, gene regulatory elements and essential genome replication features.Of the 3876 annotated open reading frames (ORFs), 480 (12.4%) were essential ORFs, 3240 (83.6%) were non-essential ORFs and 156 (4.0%) were ORFs that severely impacted fitness when mutated.The essential elements are preferentially positioned near the origin and terminus of the Caulobacter chromosome.This high-resolution strategy is applicable to high-throughput, full genome essentiality studies and large-scale genetic perturbation experiments in a broad class of bacterial species.
The regulatory events that control polar differentiation and cell-cycle progression in the bacterium Caulobacter crescentus are highly integrated, and they have to occur in the proper order (McAdams and Shapiro, 2011). Components of the core regulatory circuit are largely known. Full discovery of its essential genome, including non-coding, regulatory and coding elements, is a prerequisite for understanding the complete regulatory network of this bacterial cell. We have identified all the essential coding and non-coding elements of the Caulobacter chromosome using a hyper-saturated transposon mutagenesis strategy that is scalable and can be readily extended to obtain rapid and accurate identification of the essential genome elements of any sequenced bacterial species at a resolution of a few base pairs.
We engineered a Tn5 derivative transposon (Tn5Pxyl) that carries at one end an inducible outward pointing Pxyl promoter (Christen et al, 2010). We showed that this transposon construct inserts into the genome randomly where it can activate or disrupt transcription at the site of integration, depending on the insertion orientation. DNA from hundred of thousands of transposon insertion sites reading outward into flanking genomic regions was parallel PCR amplified and sequenced by Illumina paired-end sequencing to locate the insertion site in each mutant strain (Figure 1). A single sequencing run on DNA from a mutagenized cell population yielded 118 million raw sequencing reads. Of these, >90 million (>80%) read outward from the transposon element into adjacent genomic DNA regions and the insertion site could be mapped with single nucleotide resolution. This yielded the location and orientation of 428 735 independent transposon insertions in the 4-Mbp Caulobacter genome.
Within non-coding sequences of the Caulobacter genome, we detected 130 non-disruptable DNA segments between 90 and 393 bp long in addition to all essential promoter elements. Among 27 previously identified and validated sRNAs (Landt et al, 2008), three were contained within non-disruptable DNA segments and another three were partially disruptable, that is, insertions caused a notable growth defect. Two additional small RNAs found to be essential are the transfer-messenger RNA (tmRNA) and the ribozyme RNAseP (Landt et al, 2008). In addition to the 8 non-disruptable sRNAs, 29 out of the 130 intergenic essential non-coding sequences contained non-redundant tRNA genes; duplicated tRNA genes were non-essential. We also identified two non-disruptable DNA segments within the chromosomal origin of replication. Thus, we resolved essential non-coding RNAs, tRNAs and essential replication elements within the origin region of the chromosome. An additional 90 non-disruptable small genome elements of currently unknown function were identified. Eighteen of these are conserved in at least one closely related species. Only 2 could encode a protein of over 50 amino acids.
For each of the 3876 annotated open reading frames (ORFs), we analyzed the distribution, orientation, and genetic context of transposon insertions. There are 480 essential ORFs and 3240 non-essential ORFs. In addition, there were 156 ORFs that severely impacted fitness when mutated. The 8-bp resolution allowed a dissection of the essential and non-essential regions of the coding sequences. Sixty ORFs had transposon insertions within a significant portion of their 3′ region but lacked insertions in the essential 5′ coding region, allowing the identification of non-essential protein segments. For example, transposon insertions in the essential cell-cycle regulatory gene divL, a tyrosine kinase, showed that the last 204 C-terminal amino acids did not impact viability, confirming previous reports that the C-terminal ATPase domain of DivL is dispensable for viability (Reisinger et al, 2007; Iniesta et al, 2010). In addition, we found that 30 out of 480 (6.3%) of the essential ORFs appear to be shorter than the annotated ORF, suggesting that these are probably mis-annotated.
Among the 480 ORFs essential for growth on rich media, there were 10 essential transcriptional regulatory proteins, including 5 previously identified cell-cycle regulators (McAdams and Shapiro, 2003; Holtzendorff et al, 2004; Collier and Shapiro, 2007; Gora et al, 2010; Tan et al, 2010) and 5 uncharacterized predicted transcription factors. In addition, two RNA polymerase sigma factors RpoH and RpoD, as well as the anti-sigma factor ChrR, which mitigates rpoE-dependent stress response under physiological growth conditions (Lourenco and Gomes, 2009), were also found to be essential. Thus, a set of 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti-sigma factor are the core essential transcriptional regulators for growth on rich media. To further characterize the core components of the Caulobacter cell-cycle control network, we identified all essential regulatory sequences and operon transcripts. Altogether, the 480 essential protein-coding and 37 essential RNA-coding Caulobacter genes are organized into operons such that 402 individual promoter regions are sufficient to regulate their expression. Of these 402 essential promoters, the transcription start sites (TSSs) of 105 were previously identified (McGrath et al, 2007).
The essential genome features are non-uniformly distributed on the Caulobacter genome and enriched near the origin and the terminus regions. In contrast, the chromosomal positions of the published E. coli essential coding sequences (Rocha, 2004) are preferentially located at either side of the origin (Figure 4A). This indicates that there are selective pressures on chromosomal positioning of some essential elements (Figure 4A).
The strategy described in this report could be readily extended to quickly determine the essential genome for a large class of bacterial species.
Caulobacter crescentus is a model organism for the integrated circuitry that runs a bacterial cell cycle. Full discovery of its essential genome, including non-coding, regulatory and coding elements, is a prerequisite for understanding the complete regulatory network of a bacterial cell. Using hyper-saturated transposon mutagenesis coupled with high-throughput sequencing, we determined the essential Caulobacter genome at 8 bp resolution, including 1012 essential genome features: 480 ORFs, 402 regulatory sequences and 130 non-coding elements, including 90 intergenic segments of unknown function. The essential transcriptional circuitry for growth on rich media includes 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti-sigma factor. We identified all essential promoter elements for the cell cycle-regulated genes. The essential elements are preferentially positioned near the origin and terminus of the chromosome. The high-resolution strategy used here is applicable to high-throughput, full genome essentiality studies and large-scale genetic perturbation experiments in a broad class of bacterial species.
doi:10.1038/msb.2011.58
PMCID: PMC3202797  PMID: 21878915
functional genomics; next-generation sequencing; systems biology; transposon mutagenesis
19.  Evolution of Mutational Robustness in the Yeast Genome: A Link to Essential Genes and Meiotic Recombination Hotspots 
PLoS Genetics  2009;5(6):e1000533.
Deleterious mutations inevitably emerge in any evolutionary process and are speculated to decisively influence the structure of the genome. Meiosis, which is thought to play a major role in handling mutations on the population level, recombines chromosomes via non-randomly distributed hot spots for meiotic recombination. In many genomes, various types of genetic elements are distributed in patterns that are currently not well understood. In particular, important (essential) genes are arranged in clusters, which often cannot be explained by a functional relationship of the involved genes. Here we show by computer simulation that essential gene (EG) clustering provides a fitness benefit in handling deleterious mutations in sexual populations with variable levels of inbreeding and outbreeding. We find that recessive lethal mutations enforce a selective pressure towards clustered genome architectures. Our simulations correctly predict (i) the evolution of non-random distributions of meiotic crossovers, (ii) the genome-wide anti-correlation of meiotic crossovers and EG clustering, (iii) the evolution of EG enrichment in pericentromeric regions and (iv) the associated absence of meiotic crossovers (cold centromeres). Our results furthermore predict optimal crossover rates for yeast chromosomes, which match the experimentally determined rates. Using a Saccharomyces cerevisiae conditional mutator strain, we show that haploid lethal phenotypes result predominantly from mutation of single loci and generally do not impair mating, which leads to an accumulation of mutational load following meiosis and mating. We hypothesize that purging of deleterious mutations in essential genes constitutes an important factor driving meiotic crossover. Therefore, the increased robustness of populations to deleterious mutations, which arises from clustered genome architectures, may provide a significant selective force shaping crossover distribution. Our analysis reveals a new aspect of the evolution of genome architectures that complements insights about molecular constraints, such as the interference of pericentromeric crossovers with chromosome segregation.
Author Summary
Sexual life cycles constitute a costly alternative to vegetative modes of reproduction. Two categories of hypotheses seek to explain why sexual life cycles exist: those investigating the selective advantages that have driven the evolution of individual parts of this life cycle and those rationalizing the advantages sexual life cycles may offer as a whole, e.g., in extant species. Sex and recombination can be understood as efficient ways to interact with mutations and their consequences. Mutations occur at random and are mostly either deleterious or neutral. A prominent hypothesis suggests that sex and recombination are advantageous since they enhance the purging of such deleterious mutations and create individuals with a lower than average deleterious load. Deleterious mutations should co-determine the parameters that govern recombination of genomes in meiosis. Using an evolutionary computer simulation of diploid, unicellular sexual populations, we show that recessive lethal mutations can drive the evolution of chromosome architectures, in which essential genes become genetically linked into clusters. Evolved architectures exhibit structural properties and fitness similar to digitized yeast chromosomes and provide mutational purging capabilities superior to those of randomly generated or unclustered architectures. Our study demonstrates the importance of sexual cycles in the context of lethal mutations.
doi:10.1371/journal.pgen.1000533
PMCID: PMC2694357  PMID: 19557188
20.  Clustering phenotype populations by genome-wide RNAi and multiparametric imaging 
How to predict gene function from phenotypic cues is a longstanding question in biology.Using quantitative multiparametric imaging, RNAi-mediated cell phenotypes were measured on a genome-wide scale.On the basis of phenotypic ‘neighbourhoods', we identified previously uncharacterized human genes as mediators of the DNA damage response pathway and the maintenance of genomic integrity.The phenotypic map is provided as an online resource at http://www.cellmorph.org for discovering further functional relationships for a broad spectrum of biological module
Genetic screens for phenotypic similarity have made key contributions for associating genes with biological processes. Aggregating genes by similarity of their loss-of-function phenotype has provided insights into signalling pathways that have a conserved function from Drosophila to human (Nusslein-Volhard and Wieschaus, 1980; Bier, 2005). Complex visual phenotypes, such as defects in pattern formation during development, greatly facilitated the classification of genes into pathways, and phenotypic similarities in many cases predicted molecular relationships. With RNA interference (RNAi), highly parallel phenotyping of loss-of-function effects in cultured cells has become feasible in many organisms whose genome have been sequenced (Boutros and Ahringer, 2008). One of the current challenges is the computational categorization of visual phenotypes and the prediction of gene function and associated biological processes. With large parts of the genome still being in unchartered territory, deriving functional information from large-scale phenotype analysis promises to uncover novel gene–gene relationships and to generate functional maps to explore cellular processes.
In this study, we developed an automated approach using RNAi-mediated cell phenotypes, multiparametric imaging and computational modelling to obtain functional information on previously uncharacterized genes. To generate broad, computer-readable phenotypic signatures, we measured the effect of RNAi-mediated knockdowns on changes of cell morphology in human cells on a genome-wide scale. First, the several million cells were stained for nuclear and cytoskeletal markers and then imaged using automated microscopy. On the basis of fluorescent markers, we established an automated image analysis to classify individual cells (Figure 1A). After cell segmentation for determining nuclei and cell boundaries (Figure 1C), we computed 51 cell descriptors that quantified intensities, shape characteristics and texture (Figure 1F). Individual cells were categorized into 1 of 10 classes, which included cells showing protrusion/elongation, cells in metaphase, large cells, condensed cells, cells with lamellipodia and cellular debris (Figure 1D and E). Each siRNA knockdown was summarized by a phenotypic profile and differences between RNAi knockdowns were quantified by the similarity between phenotypic profiles. We termed the vector of scores a phenoprint (Figure 3C) and defined the phenotypic distance between a pair of perturbations as the distance between their corresponding phenoprints.
To visualize the distribution of all phenoprints, we plotted them in a genome-wide map as a two-dimensional representation of the phenotypic similarity relationships (Figure 3A). The complete data set and an interactive version of the phenotypic map are available at http://www.cellmorph.org. The map identified phenotypic ‘neighbourhoods', which are characterized by cells with lamellipodia (WNK3, ANXA4), cells with prominent actin fibres (ODF2, SOD3), abundance of large cells (CA14), many elongated cells (SH2B2, ELMO2), decrease in cell number (TPX2, COPB1, COPA), increase in number of cells in metaphase (BLR1, CIB2) and combinations of phenotypes such as presence of large cells with protrusions and bright nuclei (PTPRZ1, RRM1; Figure 3B).
To test whether phenotypic similarity might serve as a predictor of gene function, we focused our further analysis on two clusters that contained genes associated with the DNA damage response (DDR) and genomic integrity (Figure 3A and C). The first phenotypic cluster included proteins with kinetochore-associated functions such as NUF2 (Figure 3B) and SGOL1. It also contained the centrosomal protein CEP164 that has been described as an important mediator of the DNA damage-activated signalling cascade (Sivasubramaniam et al, 2008) and the largely uncharacterized genes DONSON and SON. A second phenotypically distinct cluster included previously described components of the DDR pathway such as RRM1 (Figure 3A–C), CLSPN, PRIM2 and SETD8. Furthermore, this cluster contained the poorly characterized genes CADM1 and CD3EAP.
Cells activate a signalling cascade in response to DNA damage induced by exogenous and endogenous factors. Central are the kinases ATM and ATR as they serve as sensors of DNA damage and activators of further downstream kinases (Harper and Elledge, 2007; Cimprich and Cortez, 2008). To investigate whether DONSON, SON, CADM1 and CD3EAP, which were found in phenotypic ‘neighbourhoods' to known DDR components, have a role in the DNA damage signalling pathway, we tested the effect of their depletion on the DDR on γ irradiation. As indicated by reduced CHEK1 phosphorylation, siRNA knock down of DONSON, SON, CD3EAP or CADM1 resulted in impaired DDR signalling on γ irradiation. Furthermore, knock down of DONSON or SON reduced phosphorylation of downstream effectors such as NBS1, CHEK1 and the histone variant H2AX on UVC irradiation. DONSON depletion also impaired recruitment of RPA2 onto chromatin and SON knockdown reduced RPA2 phosphorylation indicating that DONSON and SON presumably act downstream of the activation of ATM. In agreement to their phenotypic profile, these results suggest that DONSON, SON, CADM1 and CD3EAP are important mediators of the DDR. Further experiments demonstrated that they are also required for the maintenance of genomic integrity.
In summary, we show that genes with similar phenotypic profiles tend to share similar functions. The power of our computational and experimental approach is demonstrated by the identification of novel signalling regulators whose phenotypic profiles were found in proximity to known biological modules. Therefore, we believe that such phenotypic maps can serve as a resource for functional discovery and characterization of unknown genes. Furthermore, such approaches are also applicable for other perturbation reagents, such as small molecules in drug discovery and development. One could also envision combined maps that contain both siRNAs and small molecules to predict target–small molecule relationships and potential side effects.
Genetic screens for phenotypic similarity have made key contributions to associating genes with biological processes. With RNA interference (RNAi), highly parallel phenotyping of loss-of-function effects in cells has become feasible. One of the current challenges however is the computational categorization of visual phenotypes and the prediction of biological function and processes. In this study, we describe a combined computational and experimental approach to discover novel gene functions and explore functional relationships. We performed a genome-wide RNAi screen in human cells and used quantitative descriptors derived from high-throughput imaging to generate multiparametric phenotypic profiles. We show that profiles predicted functions of genes by phenotypic similarity. Specifically, we examined several candidates including the largely uncharacterized gene DONSON, which shared phenotype similarity with known factors of DNA damage response (DDR) and genomic integrity. Experimental evidence supports that DONSON is a novel centrosomal protein required for DDR signalling and genomic integrity. Multiparametric phenotyping by automated imaging and computational annotation is a powerful method for functional discovery and mapping the landscape of phenotypic responses to cellular perturbations.
doi:10.1038/msb.2010.25
PMCID: PMC2913390  PMID: 20531400
DNA damage response signalling; massively parallel phenotyping; phenotype networks; RNAi screening
21.  Genome-wide transcriptional plasticity underlies cellular adaptation to novel challenge 
By recruiting the essential HIS3 gene to the GAL regulatory system and switching to a repressing glucose medium, we confronted yeast cells with a novel challenge they had not encountered before along their history in evolution.Adaptation to this challenge involved a global transcriptional response of a sizeable fraction of the genome, which relaxed on the time scale of the population adaptation, of order of 10 generations.For a large fraction of the responding genes there is no simple biological interpretation, connecting them to the specific cellular demands imposed by the novel challenge.Strikingly, repeating the experiment did not reproduce similar transcription patterns neither in the transient phase nor in the adapted state in glucose.These results suggest that physiological selection operates on the new metabolic configurations generated by the non-specific large scale transcriptional response to eventually stabilize an adaptive state.
Cells adjust their transcriptional state to accommodate environmental and genetic perturbations. Some common perturbations, such as changes in nutrient composition, elicit well-characterized transcriptional responses that can be understood by simple engineering-like design principles as satisfying specific demands imposed by the perturbation. However, cells also have the ability to adapt to novel and unforeseen challenges. This ability is central in realizing the evolvability potential of cells as they respond to dramatic genetic or environmental changes along evolution. Little is known about the mechanisms underlying such adaptations to novel challenges; in particular, the role of the transcriptional regulatory network in such adaptations has not been characterized. Genome-wide measurements have revealed that, in many cases, perturbations lead to a global transcriptional response involving a sizeable fraction of the genome (Gasch et al, 2000; Jelinsky et al, 2000; Causton et al, 2001; Ideker et al, 2001; Lai et al, 2005). Such global behavior suggests that general collective properties of the genetic network, rather than specific pre-designed pathways, determine an important part of the transcriptional response. It is not known however what fraction of genes within such massive transcriptional responses is essential to the specific cellular demands. It is also unknown whether the non-pre-designed part of the response can have a functional role in adaptation to novel challenges.
To study these questions, we confronted yeast cells with a novel challenge they had not encountered before along their history in evolution. A strain of the yeast Saccharomyces cerevisiae was engineered to recruit the gene HIS3, an essential enzyme from the histidine biosynthesis pathway (Hinnebusch, 1992), to the GAL regulatory system, responsible for galactose utilization (Stolovicki et al, 2006). The GAL system is known to be strongly repressed when the cells are exposed to glucose. Therefore, upon switching to a medium containing glucose and lacking histidine, the GAL system and with it HIS3 are highly repressed immediately following the switch and the cells encounter a severe challenge. We have recently shown that a cell population carrying this rewired genome can adapt to grow competitively in a chemostat in a medium containing pure glucose (Stolovicki et al, 2006). This adaptation occurred on a timescale of ∼10 generations; applying a stronger environmental pressure in the form of a competitive inhibitor to HIS3 (3AT) resulted in a similar adaptation albeit with a longer timescale. Figure 1 shows the dynamics of the population's cell density (blue lines, measured by OD) following a medium switch from galactose to glucose in the chemostat without (A) and with (B) 3AT. The experiments revealed that adaptation occurs on physiological timescales (much shorter than required by spontaneous random mutations), but the mechanisms underlying this adaptation have remained unclear (Stolovicki et al, 2006).
Yeast cells had not encountered recruitment of HIS3 to the GAL system along their evolutionary history, and their genome could not possibly have been selected to specifically address glucose repression of HIS3. This experiment, therefore, provides a unique opportunity to characterize the spontaneous transcriptional response during adaptation to a novel challenge and to assess the functional role of the regulatory system in this adaptation. We used DNA microarrays to measure the genome-wide expression levels at time points along the adaptation process, with and without 3AT. These measurements revealed that a sizeable fraction of the genome responded by induction or repression to the switch into glucose. Superimposed on the OD traces, Figure 1 shows the results of a clustering analysis of the expression of genes as measured by the arrays along time in the experiments. This analysis revealed two dominant clusters, each containing hundreds of genes in each experiment, which responded to the medium switch to glucose by a strong transient induction or repression followed by relaxation to steady state on the timescale of the adaptation process, ∼ 10 generations. The two clusters in each experiment show similar but opposite dynamics.
A detailed analysis of the gene content in the two clusters revealed that only a small portion of the response was induced by a change in carbon source (15% overlap between the corresponding clusters in the two experiments, with and without 3AT). Moreover, it revealed a very low overlap with the universal stress response observed for a wide range of environmental stresses (Gasch et al, 2000; Causton et al, 2001) and with the typical response to amino-acid starvation (Natarajan et al, 2001). Additionally, all known specific responses to stress in the literature are characterized by transient induction or repression with relaxation to steady state within a generation time (Gasch et al, 2000; Koerkamp et al, 2002; Wu et al, 2004), whereas in our experiments relaxation of the transcriptional response occurs over many generations. Taken together, these results show that the transcriptional response observed here is neither a metabolic response to the change in carbon source nor is it a standard response to stress or amino-acid starvation. This raises the possibility that it is a spontaneous collective response that is largely composed of genes that do not have a specific function. This possibility was tested directly by repeating the experiment with different populations and comparing their responses. This procedure revealed reproducible adaptation dynamics and steady states in terms of population density, but showed significantly different transcriptional transient responses and steady states for the two repeated experiments. Thus, a significant portion of the genes that changed their expression during the adaptation process do not have a well-defined and reproducible function in the challenging environment.
The application of a stronger environmental pressure in the form of 3AT had a dramatic effect on the global characteristics of the transcriptional response: it induced a markedly higher correlation among the hundreds of responding genes. Figure 3A compares the array data in color code for the two experiments. It is seen that the emergent pattern of transcription exhibits a higher degree of order by the introduction of high external pressure. Observation of the transcriptional patterns for specific metabolic pathways illustrates the different contributions to the correlated dynamics (Figure 3B–D). A general energetic module such as glycolysis exhibited similar patterns of induction and relaxation in experiments with and without 3AT (Figure 3B). However, in general, we found that more than one-third of the known metabolic modules (30 out of 88 modules described in KEGG) exhibited high expression correlation among their genes when the environmental pressure was high but not when it was low. As an example, Figure 3C shows the histidine biosynthesis pathway and Figure 3D the purine pathway. Note the highly ordered trajectories in the lower panels (with 3AT) compared to the disordered ones in the upper panels (no 3AT). This order extends also between genes belonging to different and even distant metabolic modules. It indicates that a global transcriptional regulatory mechanism is in operation, rather than a local specific one. Surprisingly, genes belonging to the same metabolic pathway exhibited simultaneous positively and negatively correlated dynamics. Thus, an important conclusion of this work is that the global transcriptional response to a novel challenge cannot be explained by a simple cellular or metabolic logic. This is to be expected if the response had not been specifically selected in evolution and was not pre-designed for the challenge.
Our data clearly reveal that the massive transcriptional response underlies the adaptation process to a novel challenge. The novelty of the challenge presented to the cells excludes the possibility that this response has been specifically selected toward this challenge. Thus, transcriptional regulation has dynamic properties resulting in a general massive nonspecific response to a novel perturbation. Such a response in turn allows for metabolic rearrangements, which by feeding back on transcription lead to adaptation of the cells to the unforeseen situation. The drastic change in the expression state of the cell opens multiple new metabolic pathways. Physiological selection works then on these multiple metabolic pathways to stabilize an adaptive state that causes relaxation of the perturbed expression pattern. This scenario, involving the creation of a library of possibilities and physiological selection over this library, is compatible with our understanding of a broad class of biological systems, placing the cellular metabolic/regulatory networks on the same footing as the neural or the immune systems (Gerhart and Kirschner, 1997).
Cells adjust their transcriptional state to accommodate environmental and genetic perturbations. An open question is to what extent transcriptional response to perturbations has been specifically selected along evolution. To test the possibility that transcriptional reprogramming does not need to be ‘pre-designed' to lead to an adaptive metabolic state on physiological timescales, we confronted yeast cells with a novel challenge they had not previously encountered. We rewired the genome by recruiting an essential gene, HIS3, from the histidine biosynthesis pathway to a foreign regulatory system, the GAL network responsible for galactose utilization. Switching medium to glucose in a chemostat caused repression of the essential gene and presented the cells with a severe challenge to which they adapted over approximately 10 generations. Using genome-wide expression arrays, we show here that a global transcriptional reprogramming (>1200 genes) underlies the adaptation. A large fraction of the responding genes is nonreproducible in repeated experiments. These results show that a nonspecific transcriptional response reflecting the natural plasticity of the regulatory network supports adaptation of cells to novel challenges.
doi:10.1038/msb4100147
PMCID: PMC1865588  PMID: 17453047
adaptation; cellular metabolism; expression arrays; plasticity; transcriptional response
22.  A Mimicking-of-DNA-Methylation-Patterns Pipeline for Overcoming the Restriction Barrier of Bacteria 
PLoS Genetics  2012;8(9):e1002987.
Genetic transformation of bacteria harboring multiple Restriction-Modification (R-M) systems is often difficult using conventional methods. Here, we describe a mimicking-of-DNA-methylation-patterns (MoDMP) pipeline to address this problem in three difficult-to-transform bacterial strains. Twenty-four putative DNA methyltransferases (MTases) from these difficult-to-transform strains were cloned and expressed in an Escherichia coli strain lacking all of the known R-M systems and orphan MTases. Thirteen of these MTases exhibited DNA modification activity in Southwestern dot blot or Liquid Chromatography–Mass Spectrometry (LC–MS) assays. The active MTase genes were assembled into three operons using the Saccharomyces cerevisiae DNA assembler and were co-expressed in the E. coli strain lacking known R-M systems and orphan MTases. Thereafter, results from the dot blot and restriction enzyme digestion assays indicated that the DNA methylation patterns of the difficult-to-transform strains are mimicked in these E. coli hosts. The transformation of the Gram-positive Bacillus amyloliquefaciens TA208 and B. cereus ATCC 10987 strains with the shuttle plasmids prepared from MoDMP hosts showed increased efficiencies (up to four orders of magnitude) compared to those using the plasmids prepared from the E. coli strain lacking known R-M systems and orphan MTases or its parental strain. Additionally, the gene coding for uracil phosphoribosyltransferase (upp) was directly inactivated using non-replicative plasmids prepared from the MoDMP host in B. amyloliquefaciens TA208. Moreover, the Gram-negative chemoautotrophic Nitrobacter hamburgensis strain X14 was transformed and expressed Green Fluorescent Protein (GFP). Finally, the sequence specificities of active MTases were identified by restriction enzyme digestion, making the MoDMP system potentially useful for other strains. The effectiveness of the MoDMP pipeline in different bacterial groups suggests a universal potential. This pipeline could facilitate the functional genomics of the strains that are difficult to transform.
Author Summary
Approximately 95% of the genome-sequenced bacteria harbor Restriction-Modification (R-M) systems. R-M systems usually occur in pairs, i.e., DNA methyltransferases (MTases) and restriction endonucleases (REases). REases can degrade invading DNA to protect the cell from infection by phages. This protecting machinery has also become the barrier for experimental genetic manipulation, because the newly introduced DNA would be degraded by the REases of the transformed bacteria. In this study we have developed a pipeline to protect DNA by methylation from cleavage by host REases. Multiple DNA MTases were cloned from three difficult-to-transform bacterial strains and co-expressed in an E. coli strain lacking all of the known endogenous R-M systems and orphan MTases. Thus, the DNA methylation patterns of these strains have become similar to that of the difficult-to-transform strains. Ultimately, the DNA prepared from these E. coli strains can overcome the R-M barrier of the bacterial strains that are difficult to transform and achieve genetic manipulation. The effectiveness of this pipeline in different bacterial groups suggests a universal potential. This pipeline could facilitate functional genomics of bacterial strains that are difficult to transform.
doi:10.1371/journal.pgen.1002987
PMCID: PMC3459991  PMID: 23028379
23.  Yeast genome analysis identifies chromosomal translocation, gene conversion events and several sites of Ty element insertion 
Nucleic Acids Research  2009;37(19):6454-6465.
Paired end mapping of chromosomal fragments has been used in human cells to identify numerous structural variations in chromosomes of individuals and of cancer cell lines; however, the molecular, biological and bioinformatics methods for this technology are still in development. Here, we present a parallel bioinformatics approach to analyze chromosomal paired-end tag (ChromPET) sequence data and demonstrate its application in identifying gene rearrangements in the model organism Saccharomyces cerevisiae. We detected several expected events, including a chromosomal rearrangement of the nonessential arm of chromosome V induced by selective pressure, rearrangements introduced during strain construction and gene conversion at the MAT locus. In addition, we discovered several unannotated Ty element insertions that are present in the reference yeast strain, but not in the reference genome sequence, suggesting a few revisions are necessary in the latter. These data demonstrate that application of the chromPET technique to a genetically tractable organism like yeast provides an easy screen for studying the mechanisms of chromosomal rearrangements during the propagation of a species.
doi:10.1093/nar/gkp650
PMCID: PMC2770650  PMID: 19710036
24.  Generation and analysis of a barcode-tagged insertion mutant library in the fission yeast Schizosaccharomyces pombe 
BMC Genomics  2012;13:161.
Background
Barcodes are unique DNA sequence tags that can be used to specifically label individual mutants. The barcode-tagged open reading frame (ORF) haploid deletion mutant collections in the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe allow for high-throughput mutant phenotyping because the relative growth of mutants in a population can be determined by monitoring the proportions of their associated barcodes. While these mutant collections have greatly facilitated genome-wide studies, mutations in essential genes are not present, and the roles of these genes are not as easily studied. To further support genome-scale research in S. pombe, we generated a barcode-tagged fission yeast insertion mutant library that has the potential of generating viable mutations in both essential and non-essential genes and can be easily analyzed using standard molecular biological techniques.
Results
An insertion vector containing a selectable ura4+ marker and a random barcode was used to generate a collection of 10,000 fission yeast insertion mutants stored individually in 384-well plates and as six pools of mixed mutants. Individual barcodes are flanked by Sfi I recognition sites and can be oligomerized in a unique orientation to facilitate barcode sequencing. Independent genetic screens on a subset of mutants suggest that this library contains a diverse collection of single insertion mutations. We present several approaches to determine insertion sites.
Conclusions
This collection of S. pombe barcode-tagged insertion mutants is well-suited for genome-wide studies. Because insertion mutations may eliminate, reduce or alter the function of essential and non-essential genes, this library will contain strains with a wide range of phenotypes that can be assayed by their associated barcodes. The design of the barcodes in this library allows for barcode sequencing using next generation or standard benchtop cloning approaches.
doi:10.1186/1471-2164-13-161
PMCID: PMC3418178  PMID: 22554201
25.  Deciphering the Hybridisation History Leading to the Lager Lineage Based on the Mosaic Genomes of Saccharomyces bayanus Strains NBRC1948 and CBS380T 
PLoS ONE  2011;6(10):e25821.
Saccharomyces bayanus is a yeast species described as one of the two parents of the hybrid brewing yeast S. pastorianus. Strains CBS380T and NBRC1948 have been retained successively as pure-line representatives of S. bayanus. In the present study, sequence analyses confirmed and upgraded our previous finding: S. bayanus type strain CBS380T harbours a mosaic genome. The genome of strain NBRC1948 was also revealed to be mosaic. Both genomes were characterized by amplification and sequencing of different markers, including genes involved in maltotriose utilization or genes detected by array-CGH mapping. Sequence comparisons with public Saccharomyces spp. nucleotide sequences revealed that the CBS380T and NBRC1948 genomes are composed of: a predominant non-cerevisiae genetic background belonging to S. uvarum, a second unidentified species provisionally named S. lagerae, and several introgressed S. cerevisiae fragments. The largest cerevisiae-introgressed DNA common to both genomes totals 70kb in length and is distributed in three contigs, cA, cB and cC. These vary in terms of length and presence of MAL31 or MTY1 (maltotriose-transporter gene). In NBRC1948, two additional cerevisiae-contigs, cD and cE, totaling 12kb in length, as well as several smaller cerevisiae fragments were identified. All of these contigs were partially detected in the genomes of S. pastorianus lager strains CBS1503 (S. monacensis) and CBS1513 (S. carlsbergensis) explaining the noticeable common ability of S. bayanus and S. pastorianus to metabolize maltotriose. NBRC1948 was shown to be inter-fertile with S. uvarum CBS7001. The cross involving these two strains produced F1 segregants resembling the strains CBS380T or NRRLY-1551. This demonstrates that these S. bayanus strains were the offspring of a cross between S. uvarum and a strain similar to NBRC1948. Phylogenies established with selected cerevisiae and non-cerevisiae genes allowed us to decipher the complex hybridisation events linking S. lagerae/S. uvarum/S. cerevisiae with their hybrid species, S. bayanus/pastorianus.
doi:10.1371/journal.pone.0025821
PMCID: PMC3187814  PMID: 21998701

Results 1-25 (1167284)