Multiple large-scale analyses in yeast implicate SUMO chain function in the maintenance of higher-order chromatin structure and transcriptional repression of environmental stress response genes.
Like ubiquitin, the small ubiquitin-related modifier (SUMO) proteins can form oligomeric “chains,” but the biological functions of these superstructures are not well understood. Here, we created mutant yeast strains unable to synthesize SUMO chains (smt3allR) and subjected them to high-content microscopic screening, synthetic genetic array (SGA) analysis, and high-density transcript profiling to perform the first global analysis of SUMO chain function. This comprehensive assessment identified 144 proteins with altered localization or intensity in smt3allR cells, 149 synthetic genetic interactions, and 225 mRNA transcripts (primarily consisting of stress- and nutrient-response genes) that displayed a >1.5-fold increase in expression levels. This information-rich resource strongly implicates SUMO chains in the regulation of chromatin. Indeed, using several different approaches, we demonstrate that SUMO chains are required for the maintenance of normal higher-order chromatin structure and transcriptional repression of environmental stress response genes in budding yeast.
The mechanisms that dictate nuclear shape are largely unknown. Here we screened the budding yeast deletion collection for mutants with abnormal nuclear shape. A common phenotype was the appearance of a nuclear extension, particularly in mutants in DNA repair and chromosome segregation genes. Our data suggest that these mutations led to the abnormal nuclear morphology indirectly, by causing a checkpoint-induced cell cycle delay. Indeed, delaying cells in mitosis by other means also led to the appearance of nuclear extensions, while inactivating the DNA damage checkpoint pathway in a DNA repair mutant reduced the fraction of cells with nuclear extensions. Formation of a nuclear extension was specific to a mitotic delay, as cells arrested in S or G2 had round nuclei. Moreover, the nuclear extension always coincided with the nucleolus, while the morphology of DNA mass remained largely unchanged. Finally, we found that phospholipid synthesis continues unperturbed when cells delay in mitosis, and inhibiting phospholipid synthesis abolished the formation of nuclear extensions. Our data suggest a mechanism that promotes nuclear envelope expansion during mitosis. When mitotic progression is delayed, cells sequester the added membrane to the nuclear envelope associated with the nucleolus, possibly to avoid disruption of intra-nuclear organization.
Protein subcellular localization has been systematically characterized in budding yeast using fluorescently tagged proteins. Based on the fluorescence microscopy images, subcellular localization of many proteins can be classified automatically using supervised machine learning approaches that have been trained to recognize predefined image classes based on statistical features. Here, we present an unsupervised analysis of protein expression patterns in a set of high-resolution, high-throughput microscope images. Our analysis is based on 7 biologically interpretable features which are evaluated on automatically identified cells, and whose cell-stage dependency is captured by a continuous model for cell growth. We show that it is possible to identify most previously identified localization patterns in a cluster analysis based on these features and that similarities between the inferred expression patterns contain more information about protein function than can be explained by a previous manual categorization of subcellular localization. Furthermore, the inferred cell-stage associated to each fluorescence measurement allows us to visualize large groups of proteins entering the bud at specific stages of bud growth. These correspond to proteins localized to organelles, revealing that the organelles must be entering the bud in a stereotypical order. We also identify and organize a smaller group of proteins that show subtle differences in the way they move around the bud during growth. Our results suggest that biologically interpretable features based on explicit models of cell morphology will yield unprecedented power for pattern discovery in high-resolution, high-throughput microscopy images.
The location of a particular protein in the cell is one of the most important pieces of information that cell biologists use to understand its function. Fluorescent tags are a powerful way to determine the location of a protein in living cells. Nearly a decade ago, a collection of yeast strains was introduced, where in each strain a single protein was tagged with green fluorescent protein (GFP). Here, we show that by training a computer to accurately identify the buds of growing yeast cells, and then making simple fluorescence measurements in context of cell shape and cell stage, the computer could automatically discover most of the localization patterns (nucleus, cytoplasm, mitochondria, etc.) without any prior knowledge of what the patterns might be. Because we made the same, simple measurements for each yeast cell, we could compare and visualize the patterns of fluorescence for the entire collection of strains. This allowed us to identify large groups of proteins moving around the cell in a coordinated fashion, and to identify new, complex patterns that had previously been difficult to describe.
Screening genome-wide sets of mutants for fitness defects provides a simple but powerful approach for exploring gene function, mapping genetic networks and probing mechanisms of drug action. For yeast and other microorganisms with global mutant collections, genetic or chemical-genetic interactions can be effectively quantified by growing an ordered array of strains on agar plates as individual colonies, and then scoring the colony size changes in response to a genetic or environmental perturbation. To do so, requires efficient tools for the extraction and analysis of quantitative data. Here, we describe SGAtools (http://sgatools.ccbr.utoronto.ca), a web-based analysis system for designer genetic screens. SGAtools outlines a series of guided steps that allow the user to quantify colony sizes from images of agar plates, correct for systematic biases in the observations and calculate a fitness score relative to a control experiment. The data can also be visualized online to explore the colony sizes on individual plates, view the distribution of resulting scores, highlight genes with the strongest signal and perform Gene Ontology enrichment analysis.
The Bck2 protein is a potent genetic regulator of cell-cycle-dependent gene expression in budding yeast. To date, most experiments have focused on assessing a potential role for Bck2 in activation of the G1/S-specific transcription factors SBF (Swi4, Swi6) and MBF (Mbp1, Swi6), yet the mechanism of gene activation by Bck2 has remained obscure. We performed a yeast two-hybrid screen using a truncated version of Bck2 and discovered six novel Bck2-binding partners including Mcm1, an essential protein that binds to and activates M/G1 promoters through Early Cell cycle Box (ECB) elements as well as to G2/M promoters. At M/G1 promoters Mcm1 is inhibited by association with two repressors, Yox1 or Yhp1, and gene activation ensues once repression is relieved by an unknown activating signal. Here, we show that Bck2 interacts physically with Mcm1 to activate genes during G1 phase. We used chromatin immunoprecipitation (ChIP) experiments to show that Bck2 localizes to the promoters of M/G1-specific genes, in a manner dependent on functional ECB elements, as well as to the promoters of G1/S and G2/M genes. The Bck2-Mcm1 interaction requires valine 69 on Mcm1, a residue known to be required for interaction with Yox1. Overexpression of BCK2 decreases Yox1 localization to the early G1-specific CLN3 promoter and rescues the lethality caused by overexpression of YOX1. Our data suggest that Yox1 and Bck2 may compete for access to the Mcm1-ECB scaffold to ensure appropriate activation of the initial suite of genes required for cell cycle commitment.
Cell-cycle-dependent gene expression is a universal feature of cell cycles, with clear transcriptional programs in yeast, bacteria, and metazoans. At the M/G1 transition, many of the up-regulated genes encode key regulators of DNA replication (CDC6) and cyclins that initiate the events of cell cycle commitment (PCL9, CLN3). The promoters of genes activated at M/G1 contain a cis-regulatory sequence called the early cell cycle box (ECB), which is bound by the MADS-box transcription factor Mcm1, as well as the repressor Yox1 or Yhp1. The ECB cluster of genes defines a crucial cell cycle window during which a cell may change its fate; yet how the regulators that appear to act at ECBs are linked to cell cycle position is unclear, and coregulators, which experience tells us must exist, were unknown. Here, we describe our discovery that Bck2, a potent cell-cycle-regulator whose function has remained obscure, functions as a cofactor for Mcm1, to induce ECB–dependent gene expression. We also show that Bck2 has a role in promoting expression of late G1 and M/G2 genes. Our genetic and biochemical experiments reveal a new pathway for regulating gene expression associated with early cell cycle commitment, a process that is highly conserved.
Systematic analysis of gene overexpression phenotypes provides an insight into gene function, enzyme targets, and biological pathways. Here, we describe a novel functional genomics platform that enables a highly parallel and systematic assessment of overexpression phenotypes in pooled cultures. First, we constructed a genome-level collection of ~5100 yeast barcoder strains, each of which carries a unique barcode, enabling pooled fitness assays with a barcode microarray or sequencing readout. Second, we constructed a yeast open reading frame (ORF) galactose-induced overexpression array by generating a genome-wide set of yeast transformants, each of which carries an individual plasmid-born and sequence-verified ORF derived from the Saccharomyces cerevisiae full-length EXpression-ready (FLEX) collection. We combined these collections genetically using synthetic genetic array methodology, generating ~5100 strains, each of which is barcoded and overexpresses a specific ORF, a set we termed “barFLEX.” Additional synthetic genetic array allows the barFLEX collection to be moved into different genetic backgrounds. As a proof-of-principle, we describe the properties of the barFLEX overexpression collection and its application in synthetic dosage lethality studies under different environmental conditions.
barFLEX array; gene overexpression; barcoders; synthetic dosage lethality
In parallel with evolutionary developments, the Hsp90 molecular chaperone system shifted from a simple prokaryotic factor into an expansive network that includes a variety of cochaperones. We have taken high-throughput genomic and proteomic approaches to better understand the abundant yeast p23 cochaperone Sba1. Our work revealed an unexpected p23 network that displayed considerable independence from known Hsp90 clients. Additionally, our data uncovered a broad nuclear role for p23, contrasting with the historical dogma of restricted cytosolic activities for molecular chaperones. Validation studies demonstrated that yeast p23 was required for proper Golgi function, ribosome biogenesis and was necessary for efficient DNA repair from a wide range of mutagens. Notably, mammalian p23 had conserved roles in these pathways as well as being necessary for proper cell mobility. Taken together, our work demonstrates that the p23 chaperone serves a broad physiological network and functions both in conjunction with and sovereign to Hsp90.
Intense experimental and theoretical efforts have been made to globally map genetic interactions, yet we still do not understand how gene-gene interactions arise from the operation of biomolecular networks. To bridge the gap between empirical and computational studies, we: i) quantitatively measure genetic interactions between ~185,000 metabolic gene pairs in Saccharomyces cerevisiae, ii) superpose the data on a detailed systems biology model of metabolism, and iii) introduce a machine-learning method to reconcile empirical interaction data with model predictions. We systematically investigate the relative impacts of functional modularity and metabolic flux coupling on the distribution of negative and positive genetic interactions. We also provide a mechanistic explanation for the link between the degree of genetic interaction, pleiotropy, and gene dispensability. Last, we demonstrate the feasibility of automated metabolic model refinement by correcting misannotations in NAD biosynthesis and confirming them by in vivo experiments.
The coordination of cell-cycle events with developmental processes is essential for the reproductive success of organisms. In Drosophila melanogaster, meiosis is tightly coupled to oocyte development, and early embryos undergo specialized S-M mitoses that are supported by maternal products. We previously showed that the small phosphoprotein α-endosulfine (Endos) is required for normal oocyte meiotic maturation and early embryonic mitoses in Drosophila. In this study, we performed a genetic screen for dominant enhancers of endos00003 and identified several genomic regions that, when deleted, lead to impaired fertility of endos00003/+ heterozygous females. We uncovered matrimony (mtrm), which encodes a Polo kinase inhibitor, as a strong dominant enhancer of endos. mtrm126 +/+ endos00003 females are sterile because of defects in early embryonic mitoses, and this phenotype is reverted by removal of one copy of polo. These results provide compelling genetic evidence that excessive Polo activity underlies the strong functional interaction between endos00003 and mtrm126. Moreover, we show that endos is required for the increased expression of Mtrm in mature oocytes, which is presumably loaded into early embryos. These data are consistent with the model that maternal endos antagonizes Polo function in the early embryo to ensure normal mitoses through its effects on Mtrm expression during late oogenesis. Finally, we also identified genomic deletions that lead to loss of viability of endos00003/+ heterozygotes, consistent with recently published studies showing that endos is required zygotically to regulate the cell cycle during development.
α-endosulfine; matrimony; polo; early embryonic cell cycle; Drosophila
Two types of drug synergy, genetic and promiscuous, are explored in S. cerevisiae. The results suggest that promiscuous synergy predominates, and that propensity to synergize is an intrinsic drug property with the potential to accelerate the search for synergistic drug combinations.
Discovered 37 synergistic interactions among antifungal chemicalsPromiscuous synergy is the predominant form of drug synergyRate of synergy is an intrinsic property of drugs that can guide searches for drug synergy
Drug synergy allows a therapeutic effect to be achieved with lower doses of component drugs. Drug synergy can result when drugs target the products of genes that act in parallel pathways (‘specific synergy'). Such cases of drug synergy should tend to correspond to synergistic genetic interaction between the corresponding target genes. Alternatively, ‘promiscuous synergy' can arise when one drug non-specifically increases the effects of many other drugs, for example, by increased bioavailability. To assess the relative abundance of these drug synergy types, we examined 200 pairs of antifungal drugs in S. cerevisiae. We found 38 antifungal synergies, 37 of which were novel. While 14 cases of drug synergy corresponded to genetic interaction, 92% of the synergies we discovered involved only six frequently synergistic drugs. Although promiscuity of four drugs can be explained under the bioavailability model, the promiscuity of Tacrolimus and Pentamidine was completely unexpected. While many drug synergies correspond to genetic interactions, the majority of drug synergies appear to result from non-specific promiscuous synergy.
chemical genetics; drug combinations; drug discovery; genetic interactions
Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.
single nucleotide polymorphism; SNP; indel; comparative genomics; classical mutant
Despite the ecological and economic importance of lignin and other wood chemical components, there are few studies of the natural genetic variation that exists within plant species and its adaptive significance. We used models developed from near infra-red spectroscopy to study natural genetic variation in lignin content and monomer composition (syringyl-to-guaiacyl ratio [S/G]) as well as cellulose and extractives content, using a 16-year-old field trial of an Australian tree species, Eucalyptus globulus. We sampled 2163 progenies of 467 native trees from throughout the native geographic range of the species. The narrow-sense heritability of wood chemical traits (0.25–0.44) was higher than that of growth (0.15), but less than wood density (0.51). All wood chemical traits exhibited significant broad-scale genetic differentiation (QST = 0.34–0.43) across the species range. This differentiation exceeded that detected with putatively neutral microsatellite markers (FST = 0.09), arguing that diversifying selection has shaped population differentiation in wood chemistry. There were significant genetic correlations among these wood chemical traits at the population and additive genetic levels. However, population differentiation in the S/G ratio of lignin in particular was positively correlated with latitude (R2 = 76%), which may be driven by either adaptation to climate or associated biotic factors.
tree improvement; wood chemicals; adaptation; lignin; cellulose; extractives; syringyl; guaiacyl
Plant root systems must grow in a manner that is dictated by endogenous genetic pathways, yet sensitive to environmental input. This allows them to provide the plant with water and nutrients while navigating a heterogeneous soil environment filled with obstacles, toxins, and pests. Gravity and touch, which constitute important cues for roots growing in soil, have been shown to modulate root architecture by altering growth patterns. This is illustrated by Arabidopsis thaliana roots growing on tilted hard agar surfaces. Under these conditions, the roots are exposed to both gravity and touch stimulation. Consequently, they tend to skew their growth away from the vertical and wave along the surface. This complex growth behavior is believed to help roots avoid obstacles in nature. Interestingly, A. thaliana accessions display distinct growth patterns under these conditions, suggesting the possibility of using this variation as a tool to identify the molecular mechanisms that modulate root behavior in response to their mechanical environment. We have used the Cvi/Ler recombinant inbred line population to identify quantitative trait loci that contribute to root skewing on tilted hard agar surfaces. A combination of fine mapping for one of these QTL and microarray analysis of expression differences between Cvi and Ler root tips identifies a region on chromosome 2 as contributing to root skewing on tilted surfaces, potentially by modulating cell wall composition.
Arabidopsis; root; skewing; waving; cis-prenyltransferase
Spore germination in Saccharomyces cerevisiae is a process in which a quiescent cell begins to divide. During germination, the cell undergoes dramatic changes in cell wall and membrane composition, as well as in gene expression. To understand germination in greater detail, we screened the S. cerevisiae deletion set for germination mutants. Our results identified two genes, TRF4 and ERG6, that are required for normal germination on solid media. TRF4 is a member of the TRAMP complex that, together with the exosome, degrades RNA polymerase II transcripts. ERG6 encodes a key step in ergosterol biosynthesis. Taken together, these results demonstrate the complex nature of germination and two genes important in the process.
Saccharomyces cerevisiae; germination; sporulation; ERG6; TRF4
To identify genes involved in phenotypic traits, translational genomics from highly characterized model plants to poorly characterized crop plants provides a valuable source of markers to saturate a zone of interest as well as functionally characterized candidate genes. In this paper, an integrated view of the pea genetic map was developed. A series of gene markers were mapped and their best reciprocal homologs were identified on M. truncatula, L. japonicus, soybean, and poplar pseudomolecules. Based on the syntenic relationships uncovered between pea and M. truncatula, 5460 pea Unigenes were tentatively placed on the consensus map. A new bioinformatics tool, http://www.thelegumeportal.net/pea_mtr_translational_toolkit, was developed that allows, for any gene sequence, to search its putative position on the pea consensus map and hence to search for candidate genes among neighboring Unigenes. As an example, a promising candidate gene for the hypernodulation mutation nod3 in pea was proposed based on the map position of the likely homolog of Pub1, a M. truncatula gene involved in nodulation regulation. A broader view of pea genome evolution was obtained by revealing syntenic relationships between pea and sequenced genomes. Blocks of synteny were identified which gave new insights into the evolution of chromosome structure in Papillionoids and Eudicots. The power of the translational genomics approach was underlined.
Pisum sativum; functional consensus map; synteny; model legume species; translational genomics
Estimating the line origin of chromosomal sections from marker genotypes is a vital step in quantitative trait loci analyses of outbred line crosses. The original, and most commonly used, algorithm can only handle moderate numbers of partially informative markers. The advent of high-density genotyping with SNP chips motivates a new method because the generic sets of markers on SNP chips typically result in long stretches of partially informative markers. We validated a new method for inferring line origin, triM (tracing inheritance with Markov models), with simulated data. A realistic pattern of marker information was achieved by replicating the linkage disequilibrium from an existing chicken intercross. There were approximately 1500 SNP markers and 800 F2 individuals. The performance of triM was compared to GridQTL, which uses a variant of the original algorithm but modified for larger datasets. triM estimated the line origin with an average error of 2%, was 10% more accurate than GridQTL, considerably faster, and better at inferring positions of recombination. GridQTL could not analyze all simulated replicates and did not estimate line origin for around a third of individuals at many positions. The study shows that triM has computational benefits and improved estimation over available algorithms and is valuable for analyzing the large datasets that will be standard in future.
interval mapping; outbred line cross; line origin probabilities; hidden Markov model; SNP chip
Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals’ diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1–2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12–18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs.
copy number variation; EM algorithm; haplotype inference; phasing
Using the homozygous diploid Saccharomyces deletion collection, we searched for strains with defects in K+ homeostasis. We identified 156 (of 4653 total) strains unable to grow in the presence of hygromycin B, a phenotype previously shown to be indicative of ion defects. The most abundant group was that with deletions of genes known to encode membrane traffic regulators. Nearly 80% of these membrane traffic defective strains showed defects in uptake of the K+ homolog, 86Rb+. Since Trk1, a plasma membrane protein localized to lipid microdomains, is the major K+ influx transporter, we examined the subcellular localization and Triton-X 100 insolubility of Trk1 in 29 of the traffic mutants. However, few of these showed defects in the steady state levels of Trk1, the localization of Trk1 to the plasma membrane, or the localization of Trk1 to lipid microdomains, and most defects were mild compared to wild-type. Three inositol kinase mutants were also identified, and in contrast, loss of these genes negatively affected Trk1 protein levels. In summary, this work reveals a nexus between K+ homeostasis and membrane traffic, which does not involve traffic of the major influx transporter, Trk1.
VPS genes; TRK1
A quantitative trait locus (QTL) affecting female fertility, scored as the inverse of the number of inseminations to conception, on Bos taurus chromosome 7 was detected by a daughter design analysis of the Israeli Holstein population (P < 0.0003). Sires of five of the 10 families analyzed were heterozygous for the QTL. The 95% confidence interval of the QTL spans 27 cM from the centromere. Seven hundred and four SNP markers on the Illumina BovineSNP50 BeadChip within the QTL confidence interval were tested for concordance. A single SNP, NGS-58779, was heterozygous for all the five QTL heterozygous patriarchs, and homozygous for the remaining five QTL homozygous sires. A significant effect on fertility was associated with this marker in the sample of 900 sires genotyped (P < 10−6). Haplotype phase was the same for four of the five segregating sires. Thus concordance was obtained in nine of the ten families. We identified a common haplotype region associated with the rare and economically favorable allele of the SNP, spanning 270 kbp on BTA7 upstream to 4.72 Mbp. Eleven genes found in the common haplotype region should be considered as positional candidates for the identification of the causative quantitative trait nucleotide. Copy number variation was found in one of these genes, KIAA1683. Four gene variants were identified, but only the number of copies of a specific variant (V1) was significantly associated with breeding values of sires for fertility.
quantitative trait locus (QTL); copy number variation (CNV); KIAA1683; gene-variants; female-fertility
If perturbing two genes together has a stronger or weaker effect than expected, they are said to genetically interact. Genetic interactions are important because they help map gene function, and functionally related genes have similar genetic interaction patterns. Mapping quantitative (positive and negative) genetic interactions on a global scale has recently become possible. This data clearly shows groups of genes connected by predominantly positive or negative interactions, termed monochromatic groups. These groups often correspond to functional modules, like biological processes or complexes, or connections between modules. However it is not yet known how these patterns globally relate to known functional modules. Here we systematically study the monochromatic nature of known biological processes using the largest quantitative genetic interaction data set available, which includes fitness measurements for ∼5.4 million gene pairs in the yeast Saccharomyces cerevisiae. We find that only 10% of biological processes, as defined by Gene Ontology annotations, and less than 1% of inter-process connections are monochromatic. Further, we show that protein complexes are responsible for a surprisingly large fraction of these patterns. This suggests that complexes play a central role in shaping the monochromatic landscape of biological processes. Altogether this work shows that both positive and negative monochromatic patterns are found in known biological processes and in their connections and that protein complexes play an important role in these patterns. The monochromatic processes, complexes and connections we find chart a hierarchical and modular map of sensitive and redundant biological systems in the yeast cell that will be useful for gene function prediction and comparison across phenotypes and organisms. Furthermore the analysis methods we develop are applicable to other species for which genetic interactions will progressively become more available.
Genetic interactions indicate functional dependencies between genes and are a powerful tool to predict gene function. Functionally related genes tend to have similar profiles of genetic interactions. Recently, global scale mapping of quantitative (positive and negative) genetic interactions has been performed. This data clearly shows groups of genes connected by predominantly positive or negative interactions, termed monochromatic groups. These groups often correspond to functional modules, such as biological processes or protein complexes, or connections between modules, but it is not yet known how these patterns globally relate to known functional modules. Here we systematically evaluate the monochromatic nature of known biological processes and their connections in the yeast Saccharomyces cerevisiae. We find that 10% of biological processes and less than 1% of inter-process connections are monochromatic. Further, we show that protein complexes are responsible for a surprisingly large fraction of these monochromatic groups. The monochromatic processes, complexes and connections we find chart a hierarchical and modular map of sensitive and redundant biological systems in the yeast cell that will be useful for gene function prediction and comparison across phenotypes and organisms.
Intrinsically disordered regions are widespread, especially in proteomes of higher eukaryotes. Recently, protein disorder has been associated with a wide variety of cellular processes and has been implicated in several human diseases. Despite its apparent functional importance, the sheer range of different roles played by protein disorder often makes its exact contribution difficult to interpret.
We attempt to better understand the different roles of disorder using a novel analysis that leverages both comparative genomics and genetic interactions. Strikingly, we find that disorder can be partitioned into three biologically distinct phenomena: regions where disorder is conserved but with quickly evolving amino acid sequences (flexible disorder); regions of conserved disorder with also highly conserved amino acid sequences (constrained disorder); and, lastly, non-conserved disorder. Flexible disorder bears many of the characteristics commonly attributed to disorder and is associated with signaling pathways and multi-functionality. Conversely, constrained disorder has markedly different functional attributes and is involved in RNA binding and protein chaperones. Finally, non-conserved disorder lacks clear functional hallmarks based on our analysis.
Our new perspective on protein disorder clarifies a variety of previous results by putting them into a systematic framework. Moreover, the clear and distinct functional association of flexible and constrained disorder will allow for new approaches and more specific algorithms for disorder detection in a functional context. Finally, in flexible disordered regions, we demonstrate clear evolutionary selection of protein disorder with little selection on primary structure, which has important implications for sequence-based studies of protein structure and evolution.