It has been hypothesized that components of enzymatic pathways might organize into intracellular assemblies to improve their catalytic efficiency or lead to coordinate regulation. Accordingly, de novo purine biosynthesis enzymes may form a purinosome in the absence of purines, and a punctate intracellular body has been identified as the purinosome. We investigated the mechanism by which human de novo purine biosynthetic enzymes might be organized into purinosomes, especially under differing cellular conditions. Irregardless of the activity of bodies formed by endogenous enzymes, we demonstrate that intracellular bodies formed by transiently transfected, fluorescently tagged human purine biosynthesis proteins are best explained as protein aggregation.
Motivation: A number of computational methods have been proposed that predict protein–protein interactions (PPIs) based on protein sequence features. Since the number of potential non-interacting protein pairs (negative PPIs) is very high both in absolute terms and in comparison to that of interacting protein pairs (positive PPIs), computational prediction methods rely upon subsets of negative PPIs for training and validation. Hence, the need arises for subset sampling for negative PPIs.
Results: We clarify that there are two fundamentally different types of subset sampling for negative PPIs. One is subset sampling for cross-validated testing, where one desires unbiased subsets so that predictive performance estimated with them can be safely assumed to generalize to the population level. The other is subset sampling for training, where one desires the subsets that best train predictive algorithms, even if these subsets are biased. We show that confusion between these two fundamentally different types of subset sampling led one study recently published in Bioinformatics to the erroneous conclusion that predictive algorithms based on protein sequence features are hardly better than random in predicting PPIs. Rather, both protein sequence features and the ‘hubbiness’ of interacting proteins contribute to effective prediction of PPIs. We provide guidance for appropriate use of random versus balanced sampling.
Availability: The datasets used for this study are available at http://www.marcottelab.org/PPINegativeDataSampling.
Contact: firstname.lastname@example.org; email@example.com
Supplementary Information: Supplementary data are available at Bioinformatics online.
Proteins play major roles in most biological processes; as a consequence, protein expression levels are highly regulated. While extensive post-transcriptional, translational and protein degradation control clearly influence protein concentration and functionality, it is often thought that protein abundances are primarily determined by the abundances of the corresponding mRNAs. Hence surprisingly, a recent study showed that abundances of orthologous nematode and fly proteins correlate better than their corresponding mRNA abundances. We tested if this phenomenon is general by collecting and testing matching large-scale protein and mRNA expression datasets from seven different species: two bacteria, yeast, nematode, fly, human, and plant. We find that steady-state abundances of proteins show significantly higher correlation across these diverse phylogenetic taxa than the abundances of their corresponding mRNAs (p=0.0008, paired Wilcoxon). These data support the presence of strong selective pressure to maintain protein abundances during evolution, even when mRNA abundances diverge.
Increasing knowledge about the organization of proteins into complexes, systems, and pathways has led to a flowering of theoretical approaches for exploiting this knowledge in order to better learn the functions of proteins and their roles underlying phenotypic traits and diseases. Much of this body of theory has been developed and tested in model organisms, relying on their relative simplicity and genetic and biochemical tractability to accelerate the research. In this review, we discuss several of the major approaches for computationally integrating proteomics and genomics observations into integrated protein networks, then applying guilt-by-association in these networks in order to identify genes underlying traits. Recent trends in this field include a rising appreciation of the modular network organization of proteins underlying traits or mutational phenotypes, and how to exploit such protein modularity using computational approaches related to the internet search algorithm PageRank. Many protein network-based predictions have recently been experimentally confirmed in yeast, worms, plants, and mice, and several successful approaches in model organisms have been directly translated to analyze human disease, with notable recent applications to glioma and breast cancer prognosis.
Data integration; Function prediction; Humans; Model organisms; Phenotype prediction; Protein interaction networks
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.
integrative analysis; database search; peptide identification
Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.
Humans, like most complex organisms, have two copies of most genes in their genome, one from the mother and one from the father. This redundancy provides a back-up copy for most genes, should one copy be lost through mutation. For a minority of genes, one functional copy is not enough to sustain normal human function, and mutations causing the loss of function of one of the copies of such genes are a major cause of childhood developmental diseases. Over the past 20 years medical geneticists have identified over 300 such genes, but it is not known how many of the 22,000 genes in our genome may also be sensitive to gene loss. By comparing these ∼300 genes known to be sensitive to gene loss with over 1,000 genes where loss of a single copy does not result in disease, we have identified some key evolutionary and functional similarities between genes sensitive to loss of a single copy. We have used these similarities to predict for most genes in the genome, whether loss of a single copy is likely to result in disease. These predictions will help in the interpretation of mutations seen in patients.
The Gram-negative bacterium Pseudomonas aeruginosa is a common cause of chronic airway infections in individuals with the heritable disease cystic fibrosis (CF). After prolonged colonization of the CF lung, P. aeruginosa becomes highly resistant to host clearance and antibiotic treatment; therefore, understanding how this bacterium evolves during chronic infection is important for identifying beneficial adaptations that could be targeted therapeutically. To identify potential adaptive traits of P. aeruginosa during chronic infection, we carried out global transcriptomic profiling of chronological clonal isolates obtained from 3 individuals with CF. Isolates were collected sequentially over periods ranging from 3 months to 8 years, representing up to 39,000 in vivo generations. We identified 24 genes that were commonly regulated by all 3 P. aeruginosa lineages, including several genes encoding traits previously shown to be important for in vivo growth. Our results reveal that parallel evolution occurs in the CF lung and that at least a proportion of the traits identified are beneficial for P. aeruginosa chronic colonization of the CF lung.
Deadly diseases like AIDS, malaria, and tuberculosis are the result of long-term chronic infections. Pathogens that cause chronic infections adapt to the host environment, avoiding the immune response and resisting antimicrobial agents. Studies of pathogen adaptation are therefore important for understanding how the efficacy of current therapeutics may change upon prolonged infection. One notorious chronic pathogen is Pseudomonas aeruginosa, a bacterium that causes long-term infections in individuals with the heritable disease cystic fibrosis (CF). We used gene expression profiles to identify 24 genes that commonly changed expression over time in 3 P. aeruginosa lineages, indicating that these changes occur in parallel in the lungs of individuals with CF. Several of these genes have previously been shown to encode traits critical for in vivo-relevant processes, suggesting that they are likely beneficial adaptations important for chronic colonization of the CF lung.
Edge detection is a signal processing algorithm common in artificial intelligence and image recognition programs. We have constructed a genetically encoded edge detection algorithm that programs an isogenic community of E.coli to sense an image of light, communicate to identify the light-dark edges, and visually present the result of the computation. The algorithm is implemented using multiple genetic circuits. An engineered light sensor enables cells to distinguish between light and dark regions. In the dark, cells produce a diffusible chemical signal that diffuses into light regions. Genetic logic gates are used so that only cells that sense light and the diffusible signal produce a positive output. A mathematical model constructed from first principles and parameterized with experimental measurements of the component circuits predicts the performance of the complete program. Quantitatively accurate models will facilitate the engineering of more complex biological behaviors and inform bottom-up studies of natural genetic regulatory networks.
Plants are essential sources of food, fiber and renewable energy. Effective methods for manipulating plant traits have important agricultural and economic consequences. We introduce a rational approach for associating genes with plant traits by combined use of a genome-scale functional network and targeted reverse genetic screening. We present a probabilistic network (AraNet) of functional associations among 19,647 (73%) genes of the reference flowering plant Arabidopsis thaliana. AraNet associations have measured precision greater than literature-based protein interactions (21%) for 55% of genes, and are highly predictive for diverse biological pathways. Using AraNet, we found a 10-fold enrichment in identifying early seedling development genes. By interrogating network neighborhoods, we identify At1g80710 (now Drought sensitive 1; Drs1) and At3g05090 (now Lateral root stimulator 1; Lrs1) as novel regulators of drought sensitivity and lateral root development, respectively. AraNet (http://www.functionalnet.org/aranet/) provides a global resource for plant gene function identification and genetic dissection of plant traits.
The planar cell polarity (PCP) signaling pathway is essential for embryonic development because it governs diverse cellular behaviors, and the “core PCP” proteins, such as Dishevelled and Frizzled, have been extensively characterized1–4. By contrast, the “PCP effector” proteins, such as Intu and Fuz, remain largely unstudied5, 6. These proteins are essential for PCP signaling, but they have never been investigated in a mammal and their cell biological activities remain entirely unknown. We report here that Fuz mutant mice display neural tube defects, skeletal dysmorphologies, and Hedgehog signaling defects stemming from disrupted ciliogenesis. Using bioinformatics and imaging of an in vivo mucociliary epithelium, we establish a central role for Fuz in membrane trafficking, showing that Fuz is essential for trafficking of cargo to basal bodies and to the apical tips of cilia. Fuz is also essential for exocytosis in secretory cells. Finally, we identify a novel, Rab-related small GTPase as a Fuz interaction partner that is also essential for ciliogenesis and secretion. These results are significant because they provide novel insights into the mechanisms by which developmental regulatory systems like PCP signaling interface with fundamental cellular systems such as the vesicle trafficking machinery.
Many genes are toxic when overexpressed, but general mechanisms for this toxicity have proven elusive. Vavouri et al. (2009) find that intrinsic protein disorder and promiscuous molecular interactions are strong determinants of dosage sensitivity, explaining in part the toxicity of dosage-sensitive oncogenes in mice and humans.
The step by step assembly process from preribosome in the nucleus to translation-competent 60S ribosome subunit in the cytoplasm is revealed (also see Kemmler et al. in this issue).
The ribosome stalk is essential for recruitment of translation factors. In yeast, P0 and Rpl12 correspond to bacterial L10 and L11 and form the stalk base of mature ribosomes, whereas Mrt4 is a nuclear paralogue of P0. In this study, we show that the dual-specificity phosphatase Yvh1 is required for the release of Mrt4 from the pre-60S subunits. Deletion of YVH1 leads to the persistence of Mrt4 on pre-60S subunits in the cytoplasm. A mutation in Mrt4 at the protein–RNA interface bypasses the requirement for Yvh1. Pre-60S subunits associated with Yvh1 contain Rpl12 but lack both Mrt4 and P0. These results suggest a linear series of events in which Yvh1 binds to the pre-60S subunit to displace Mrt4. Subsequently, P0 loads onto the subunit to assemble the mature stalk, and Yvh1 is released. The initial assembly of the ribosome with Mrt4 may provide functional compartmentalization of ribosome assembly in addition to the spatial separation afforded by the nuclear envelope.
Here we describe human spotted cell chips, a technology for determining cellular state across arrays of cells subjected to chemical or genetic perturbation. Cells are grown and treated under standard tissue culture conditions before being fixed and printed onto replicate glass slides, effectively decoupling the experimental conditions from the assay technique. Each slide is then probed using immunofluorescence or other optical reporter and assayed by automated microscopy. We show potential applications of the cell chip by assaying HeLa and A549 samples for changes in target protein abundance (of the dsRNA-activated protein kinase PKR), subcellular localization (nuclear translocation of NFκB) and activation state (phosphorylation of STAT1 and of the p38 and JNK stress kinases) in response to treatment by several chemical effectors (anisomycin, TNFα, and interferon), and we demonstrate scalability by printing a chip with ∼4,700 discrete samples of HeLa cells. Coupling this technology to high-throughput methods for culturing and treating cell lines could enable researchers to examine the impact of exogenous effectors on the same population of experimentally treated cells across multiple reporter targets potentially representing a variety of molecular systems, thus producing a highly multiplexed dataset with minimized experimental variance and at reduced reagent cost compared to alternative techniques. The ability to prepare and store chips also allows researchers to follow up on observations gleaned from initial screens with maximal repeatability.
Motivation: High-throughput protein identification experiments based on tandem mass spectrometry (MS/MS) often suffer from low sensitivity and low-confidence protein identifications. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other evidence to suggest that a protein is present and confidence in individual protein identification can be updated accordingly.
Results: We develop a method that analyzes MS/MS experiments in the larger context of the biological processes active in a cell. Our method, MSNet, improves protein identification in shotgun proteomics experiments by considering information on functional associations from a gene functional network. MSNet substantially increases the number of proteins identified in the sample at a given error rate. We identify 8–29% more proteins than the original MS experiment when applied to yeast grown in different experimental conditions analyzed on different MS/MS instruments, and 37% more proteins in a human sample. We validate up to 94% of our identifications in yeast by presence in ground-truth reference sets.
Availability and Implementation: Software and datasets are available at http://aug.csres.utexas.edu/msnet
Contact: firstname.lastname@example.org, email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other information available, e.g. the probability of a protein's presence is likely to correlate with its mRNA concentration.
Results: We develop a Bayesian score that estimates the posterior probability of a protein's presence in the sample given its identification in an MS/MS experiment and its mRNA concentration measured under similar experimental conditions. Our method, MSpresso, substantially increases the number of proteins identified in an MS/MS experiment at the same error rate, e.g. in yeast, MSpresso increases the number of proteins identified by ∼40%. We apply MSpresso to data from different MS/MS instruments, experimental conditions and organisms (Escherichia coli, human), and predict 19–63% more proteins across the different datasets. MSpresso demonstrates that incorporating prior knowledge of protein presence into shotgun proteomics experiments can substantially improve protein identification scores.
Availability and Implementation: Software is available upon request from the authors. Mass spectrometry datasets and supplementary information are available from http://www.marcottelab.org/MSpresso/.
Contact: firstname.lastname@example.org; email@example.com
Supplementary Information: Supplementary data website: http://www.marcottelab.org/MSpresso/.
Polarizing cells extensively restructure cellular components in a spatially and temporally coupled manner along the major axis of cellular extension. Budding yeast are a useful model of polarized growth, helping to define many molecular components of this conserved process. Besides budding, yeast cells also differentiate upon treatment with pheromone from the opposite mating type, forming a mating projection (the ‘shmoo’) by directional restructuring of the cytoskeleton, localized vesicular transport and overall reorganization of the cytosol. To characterize the proteomic localization changes accompanying polarized growth, we developed and implemented a novel cell microarray-based imaging assay for measuring the spatial redistribution of a large fraction of the yeast proteome, and applied this assay to identify proteins localized along the mating projection following pheromone treatment. We further trained a machine learning algorithm to refine the cell imaging screen, identifying additional shmoo-localized proteins. In all, we identified 74 proteins that specifically localize to the mating projection, including previously uncharacterized proteins (Ycr043c, Ydr348c, Yer071c, Ymr295c, and Yor304c-a) and known polarization complexes such as the exocyst. Functional analysis of these proteins, coupled with quantitative analysis of individual organelle movements during shmoo formation, suggests a model in which the basic machinery for cell polarization is generally conserved between processes forming the bud and the shmoo, with a distinct subset of proteins used only for shmoo formation. The net effect is a defined ordering of major organelles along the polarization axis, with specific proteins implicated at the proximal growth tip.
Upon sensing mating pheromone, budding yeast cells form a mating projection (the ‘shmoo’) that serves as a model for polarized cell growth, involving cytoskeletal/cytosolic restructuring and directed vesicular transport. We developed a cell microarray-based imaging assay for measuring localization of the yeast proteome during polarized growth. We find major organelles ordered along the polarization axis, localize 74 proteins to the growth tip, and observe adaptive reuse of general polarization machinery.
Proteomics; polarized growth; subcellular localization; pheromone response; yeast
Proteins interact in complex protein–protein interaction (PPI) networks whose topological properties—such as scale-free topology, hierarchical modularity, and dissortativity—have suggested models of network evolution. Currently preferred models invoke preferential attachment or gene duplication and divergence to produce networks whose topology matches that observed for real PPIs, thus supporting these as likely models for network evolution. Here, we show that the interaction density and homodimeric frequency are highly protein age–dependent in real PPI networks in a manner which does not agree with these canonical models. In light of these results, we propose an alternative stochastic model, which adds each protein sequentially to a growing network in a manner analogous to protein crystal growth (CG) in solution. The key ideas are (1) interaction probability increases with availability of unoccupied interaction surface, thus following an anti-preferential attachment rule, (2) as a network grows, highly connected sub-networks emerge into protein modules or complexes, and (3) once a new protein is committed to a module, further connections tend to be localized within that module. The CG model produces PPI networks consistent in both topology and age distributions with real PPI networks and is well supported by the spatial arrangement of protein complexes of known 3-D structure, suggesting a plausible physical mechanism for network evolution.
Proteins function together forming stable protein complexes or transient interactions in various cellular processes, such as gene regulation and signaling. Here, we address the basic question of how these networks of interacting proteins evolve. This is an important problem, as the structures of such networks underlie important features of biological systems, such as functional modularity, error-tolerance, and stability. It is not yet known how these network architectures originate or what driving forces underlie the observed network structure. Several models have been proposed over the past decade—in particular, a “rich get richer” model (preferential attachment) and a model based upon gene duplication and divergence—often based only on network topologies. Here, we show that real yeast protein interaction networks show a unique age distribution among interacting proteins, which rules out these canonical models. In light of these results, we developed a simple, alternative model based on well-established physical principles, analogous to the process of growing protein crystals in solution. The model better explains many features of real PPI networks, including the network topologies, their characteristic age distributions, and the spatial distribution of subunits of differing ages within protein complexes, suggesting a plausible physical mechanism of network evolution.
Summary: Mass spectrometry-based proteomics stands to gain from additional analysis of its data, but its large, complex datasets make demands on speed and memory usage requiring special consideration from scripting languages. The software library ‘mspire’—developed in the Ruby programming language—offers quick and memory-efficient readers for standard xml proteomics formats, converters for intermediate file types in typical proteomics spectral-identification work flows (including the Bioworks .srf format), and modules for the calculation of peptide false identification rates.
Availability: Freely available at http://mspire.rubyforge.org. Additional data models, usage information, and methods available at http://bioinformatics.icmb.utexas.edu/mspire
BUD23 was identified from a bioinformatics analysis of Saccharomyces cerevisiae genes involved in ribosome biogenesis. Deletion of BUD23 leads to severely impaired growth, reduced levels of the small (40S) ribosomal subunit, and a block in processing 20S rRNA to 18S rRNA, a late step in 40S maturation. Bud23 belongs to the S-adenosylmethionine-dependent Rossmann-fold methyltransferase superfamily and is related to small-molecule methyltransferases. Nevertheless, we considered that Bud23 methylates rRNA. Methylation of G1575 is the only mapped modification for which the methylase has not been assigned. Here, we show that this modification is lost in bud23 mutants. The nuclear accumulation of the small-subunit reporters Rps2-green fluorescent protein (GFP) and Rps3-GFP, as well as the rRNA processing intermediate, the 5′ internal transcribed spacer 1, indicate that bud23 mutants are defective for small-subunit export. Mutations in Bud23 that inactivated its methyltransferase activity complemented a bud23Δ mutant. In addition, mutant ribosomes in which G1575 was changed to adenosine supported growth comparable to that of cells with wild-type ribosomes. Thus, Bud23 protein, but not its methyltransferase activity, is important for biogenesis and export of the 40S subunit in yeast.
Regulation of cell cycle progression is fundamental to cell health and reproduction, and failures in this process are associated with many human diseases. Much of our knowledge of cell cycle regulators derives from loss-of-function studies. To reveal new cell cycle regulatory genes that are difficult to identify in loss-of-function studies, we performed a near-genome-wide flow cytometry assay of yeast gene overexpression-induced cell cycle delay phenotypes. We identified 108 genes whose overexpression significantly delayed the progression of the yeast cell cycle at a specific stage. Many of the genes are newly implicated in cell cycle progression, for example SKO1, RFA1, and YPR015C. The overexpression of RFA1 or YPR015C delayed the cell cycle at G2/M phases by disrupting spindle attachment to chromosomes and activating the DNA damage checkpoint, respectively. In contrast, overexpression of the transcription factor SKO1 arrests cells at G1 phase by activating the pheromone response pathway, revealing new cross-talk between osmotic sensing and mating. More generally, 92%–94% of the genes exhibit distinct phenotypes when overexpressed as compared to their corresponding deletion mutants, supporting the notion that many genes may gain functions upon overexpression. This work thus implicates new genes in cell cycle progression, complements previous screens, and lays the foundation for future experiments to define more precisely roles for these genes in cell cycle progression.
All cells require proper cell cycle regulation; failure leads to numerous human diseases. Cell cycle mechanisms are broadly conserved across eukaryotes, with many key regulatory genes known. Nonetheless, our knowledge of regulators is incomplete. Many classic studies have analyzed yeast loss-of-function mutants to identify cell cycle genes. Studies have also implicated genes based upon their overexpression phenotypes, but the effects of gene overexpression on the cell cycle have not been quantified for all yeast genes. We individually quantified the effect of overexpression on cell cycle progression for nearly all (91%) of yeast genes, and we report the 108 genes causing the most significant and reproducible cell cycle defects, most of which have not been previously observed. We characterize three genes in more detail, implicating one in chromosomal segregation and mitotic spindle formation. A second affects mitotic stability and the DNA damage checkpoint. Curiously, overexpression of a third gene, SKO1, arrests the cell cycle by activating the pheromone response pathway, with cells mistakenly behaving as if mating pheromone is present. These results establish a basis for future experiments elucidating precise cell cycle roles for these genes. Similar assays in human cells could help further clarify the many connections between cell cycle control and cancers.
Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations.
We report a significantly improved version (v. 2) of a probabilistic functional gene network  of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.
YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.
U/G and T/G mismatches commonly occur due to spontaneous deamination of cytosine and 5-methylcytosine in double-stranded DNA. This mutagenic effect is particularly strong for extreme thermophiles, since the spontaneous deamination reaction is much enhanced at high temperature. Previously, a U/G and T/G mismatch-specific glycosylase (Mth-MIG) was found on a cryptic plasmid of the archaeon Methanobacterium thermoautotrophicum, a thermophile with an optimal growth temperature of 65°C. We report characterization of a putative DNA glycosylase from the hyperthermophilic archaeon Pyrobaculum aerophilum, whose optimal growth temperature is 100°C. The open reading frame was first identified through a genome sequencing project in our laboratory. The predicted product of 230 amino acids shares significant sequence homology to [4Fe-4S]-containing Nth/MutY DNA glycosylases. The histidine-tagged recombinant protein was expressed in Escherichia coli and purified. It is thermostable and displays DNA glycosylase activities specific to U/G and T/G mismatches with an uncoupled AP lyase activity. It also processes U/7,8-dihydro-oxoguanine and T/7,8-dihydro-oxoguanine mismatches. We designate it Pa-MIG. Using sequence comparisons among complete bacterial and archaeal genomes, we have uncovered a putative MIG protein from another hyperthermophilic archaeon, Aeropyrum pernix. The unique conserved amino acid motifs of MIG proteins are proposed to distinguish MIG proteins from the closely related Nth/MutY DNA glycosylases.
The Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla.edu ) is a database that documents experimentally determined protein–protein interactions. This database is intended to provide the scientific community with a comprehensive and integrated tool for browsing and efficiently extracting information about protein interactions and interaction networks in biological processes. Beyond cataloging details of protein–protein interactions, the DIP is useful for understanding protein function and protein–protein relationships, studying the properties of networks of interacting proteins, benchmarking predictions of protein–protein interactions, and studying the evolution of protein–protein interactions.
Analysis of a genetic module repurposed between yeast and vertebrates reveals that a common antifungal medication is also a potent vascular disrupting agent.
Studies in diverse organisms have revealed a surprising depth to the evolutionary conservation of genetic modules. For example, a systematic analysis of such conserved modules has recently shown that genes in yeast that maintain cell walls have been repurposed in vertebrates to regulate vein and artery growth. We reasoned that by analyzing this particular module, we might identify small molecules targeting the yeast pathway that also act as angiogenesis inhibitors suitable for chemotherapy. This insight led to the finding that thiabendazole, an orally available antifungal drug in clinical use for 40 years, also potently inhibits angiogenesis in animal models and in human cells. Moreover, in vivo time-lapse imaging revealed that thiabendazole reversibly disassembles newly established blood vessels, marking it as vascular disrupting agent (VDA) and thus as a potential complementary therapeutic for use in combination with current anti-angiogenic therapies. Importantly, we also show that thiabendazole slows tumor growth and decreases vascular density in preclinical fibrosarcoma xenografts. Thus, an exploration of the evolutionary repurposing of gene networks has led directly to the identification of a potential new therapeutic application for an inexpensive drug that is already approved for clinical use in humans.
Yeast cells and vertebrate blood vessels would not seem to have much in common. However, we have discovered that during the course of evolution, a group of proteins whose function in yeast is to maintain cell walls has found an alternative use in vertebrates regulating angiogenesis. This remarkable repurposing of the proteins during evolution led us to hypothesize that, despite the different functions of the proteins in humans compared to yeast, drugs that modulated the yeast pathway might also modulate angiogenesis in humans and in animal models. One compound seemed a particularly promising candidate for this sort of approach: thiabendazole (TBZ), which has been in clinical use as a systemic antifungal and deworming treatment for 40 years. Gratifyingly, our study shows that TBZ is indeed able to act as a vascular disrupting agent and an angiogenesis inhibitor. Notably, TBZ also slowed tumor growth and decreased vascular density in human tumors grafted into mice. TBZ’s historical safety data and low cost make it an outstanding candidate for translation to clinical use as a complement to current anti-angiogenic strategies for the treatment of cancer. Our work demonstrates how model organisms from distant branches of the evolutionary tree can be exploited to arrive at a promising new drug.
Gene networks are an efficient route for associating candidate genes with biological processes. Here, networks are used to discover more than 15 new genes for ribosomal subunit maturation, rRNA processing, and ribosomal export from the nucleus.
Biogenesis of ribosomes is an essential cellular process conserved across all eukaryotes and is known to require >170 genes for the assembly, modification, and trafficking of ribosome components through multiple cellular compartments. Despite intensive study, this pathway likely involves many additional genes. Here, we employ network-guided genetics—an approach for associating candidate genes with biological processes that capitalizes on recent advances in functional genomic and proteomic studies—to computationally identify additional ribosomal biogenesis genes. We experimentally evaluated >100 candidate yeast genes in a battery of assays, confirming involvement of at least 15 new genes, including previously uncharacterized genes (YDL063C, YIL091C, YOR287C, YOR006C/TSR3, YOL022C/TSR4). We associate the new genes with specific aspects of ribosomal subunit maturation, ribosomal particle association, and ribosomal subunit nuclear export, and we identify genes specifically required for the processing of 5S, 7S, 20S, 27S, and 35S rRNAs. These results reveal new connections between ribosome biogenesis and mRNA splicing and add >10% new genes—most with human orthologs—to the biogenesis pathway, significantly extending our understanding of a universally conserved eukaryotic process.
Ribosomes are the extremely complex cellular machines responsible for constructing new proteins. In eukaryotic cells, such as yeast, each ribosome contains more than 80 protein or RNA components. These complex machines must themselves be assembled by an even more complex machinery spanning multiple cellular compartments and involving perhaps 200 components in an ordered series of processing events, resulting in delivery of the two halves of the mature ribosome, the 40S and 60S components, to the cytoplasm. The ribosome biogenesis machinery has been only partially characterized, and many lines of evidence suggest that there are additional components that are still unknown. We employed an emerging computational technique called network-guided genetics to identify new candidate genes for this pathway. We then tested the candidates in a battery of experimental assays to determine what roles the genes might play in the biogenesis of ribosomes. This approach proved an efficient route to the discovery of new genes involved in ribosome biogenesis, significantly extending our understanding of a universally conserved eukaryotic process.