Motivation: The model bacterium Escherichia coli is among the best studied prokaryotes, yet nearly half of its proteins are still of unknown biological function. This is despite a wealth of available large-scale physical and genetic interaction data. To address this, we extended the GeneMANIA function prediction web application developed for model eukaryotes to support E.coli.
Results: We integrated 48 distinct E.coli functional interaction datasets and used the GeneMANIA algorithm to produce thousands of novel functional predictions and prioritize genes for further functional assays. Our analysis achieved cross-validation performance comparable to that reported for eukaryotic model organisms, and revealed new functions for previously uncharacterized genes in specific bioprocesses, including components required for cell adhesion, iron–sulphur complex assembly and ribosome biogenesis. The GeneMANIA approach for network-based function prediction provides an innovative new tool for probing mechanisms underlying bacterial bioprocesses.
Supplementary data are available at Bioinformatics online.
Our analysis examines the conservation of multiprotein complexes among metazoa through use of high resolution biochemical fractionation and precision mass spectrometry applied to soluble cell extracts from 5 representative model organisms Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Strongylocentrotus purpuratus, and Homo sapiens. The interaction network obtained from the data was validated globally in 4 distant species (Xenopus laevis, Nematostella vectensis, Dictyostelium discoideum, Saccharomyces cerevisiae) and locally by targeted affinity-purification experiments. Here we provide details of our massive set of supporting biochemical fractionation data available via ProteomeXchange (PXD002319-PXD002328), PPIs via BioGRID (185267); and interaction network projections via (http://metazoa.med.utoronto.ca) made fully accessible to allow further exploration. The datasets here are related to the research article on metazoan macromolecular complexes in Nature .
Proteomics; Metazoa; Protein complexes; Biochemical; Fractionation
In embryonic stem cells (ESCs), gene regulatory networks (GRNs) coordinate gene expression to maintain ESC identity; however, the complete repertoire of factors regulating the ESC state is not fully understood. Our previous temporal microarray analysis of ESC commitment identified the E3 ubiquitin ligase protein Makorin-1 (MKRN1) as a potential novel component of the ESC GRN. Here, using multilayered systems-level analyses, we compiled a MKRN1-centered interactome in undifferentiated ESCs at the proteomic and ribonomic level. Proteomic analyses in undifferentiated ESCs revealed that MKRN1 associates with RNA-binding proteins, and ensuing RIP-chip analysis determined that MKRN1 associates with mRNAs encoding functionally related proteins including proteins that function during cellular stress. Subsequent biological validation identified MKRN1 as a novel stress granule-resident protein, although MKRN1 is not required for stress granule formation, or survival of unstressed ESCs. Thus, our unbiased systems-level analyses support a role for the E3 ligase MKRN1 as a ribonucleoprotein within the ESC GRN.
embryonic stem cells; makorin-1; RNA metabolism; stress granules
Spatially targeted optical microproteomics (STOMP) is a novel proteomics technique for interrogating micron-scale regions of interest (ROIs) in mammalian tissue, with no requirement for genetic manipulation. Methanol or formalin-fixed specimens are stained with fluorescent dyes or antibodies to visualize ROIs, then soaked in solutions containing the photo-tag: 4-benzoylbenzyl-glycyl-hexahistidine. Confocal imaging along with two photon excitation are used to covalently couple photo-tags to all proteins within each ROI, to a resolution of 0.67 µm in the xy-plane and 1.48 µm axially. After tissue solubilization, photo-tagged proteins are isolated and identified by mass spectrometry. As a test case, we examined amyloid plaques in an Alzheimer's disease (AD) mouse model and a post-mortem AD case, confirming known plaque constituents and discovering new ones. STOMP can be applied to various biological samples including cell lines, primary cell cultures, ex vivo specimens, biopsy samples, and fixed post-mortem tissue.
Neurodegenerative diseases such as Alzheimer's disease affect millions of people worldwide. In many of these diseases, toxic proteins accumulate in the brain and build up as small ‘plaques’ in the gaps, or synapses, that cells called neurons communicate across. Eventually, the plaques prevent the neurons signaling to each other correctly, leading to problems such as memory loss.
Identifying the proteins present in plaques is technically challenging, partly because the plaques are very small. Hadley, Rakhit et al. have now developed a new method called spatially targeted optical microproteomics (or STOMP) that can collect proteins from small areas of cells. In this method, plaques are identified under a light microscope, and their contents are attached to a molecule called a photo-affinity tag using lasers. The photo-tagged proteins are then pulled out using beads that specifically bind to the photo-affinity tag. The proteins can then be identified using a well-established method called mass spectrometry.
Hadley, Rakhit et al. used STOMP to analyze plaques present in the brains of mice that develop similar symptoms to those seen in humans with Alzheimer's disease. This revealed that these plaques contain more than 50 different proteins, some of which had not previously been found in plaques. In particular, several proteins from the ‘presynaptic’ neuron that sends signals across the synapse were found in the plaques. However, no proteins from the receiving (‘postsynaptic’) neuron on the other side of the synapse were present in the plaque.
Fixed human brain tissue is more difficult to analyze than mouse samples because it is modified for storage. In spite of these issues, Hadley, Rakhit et al. successfully also used STOMP to identify the proteins in human plaques. STOMP can be used to identify the proteins present in any area of a cell and thus has the potential to be widely used by scientists, not just those studying plaques.
proteomics; confocal microscopy; mass spectroscopy; protein misfolding; protein folding; human; mouse
During mitosis, the spindle assembly checkpoint (SAC) monitors the attachment of kinetochores (KTs) to the plus ends of spindle microtubules (MTs) and prevents anaphase onset until chromosomes are aligned and KTs are under proper tension. Here, we identify a SAC component, BuGZ/ZNF207, from an RNAi viability screen in human Glioblastoma multiforme (GBM) brain tumor stem cells. BuGZ binds to and stabilizes Bub3 during interphase and mitosis through a highly conserved GLE2p-binding sequence (GLEBS) domain. Inhibition of BuGZ results in loss of both Bub3 and its binding partner Bub1 from KTs, reduction of Bub1-dependent phosphorylation of centromeric histone H2A, attenuation of KT-based Aurora kinase B activity, and lethal chromosome congression defects in cancer cells. Phylogenetic analysis indicates that BuGZ orthologs are highly conserved among eukaryotes, but are conspicuously absent from budding and fission yeasts. These findings suggest BuGZ has evolved to facilitate Bub3 activity and chromosome congression in higher eukaryotes.
BuGZ; ZNF207; Bub3; spindle assembly checkpoint; kinetochore; cancer stem cells; Glioblastoma multiforme
The RNA polymerase II (RNAPII) carboxyl-terminal domain (CTD) heptapeptide repeats (Y1-S2-P3-T4-S5-P6-S7) undergo dynamic phosphorylation and dephosphorylation during the transcription cycle to recruit factors that regulate transcription, RNA processing and chromatin modification. We show here that RPRD1A and RPRD1B form homodimers and heterodimers through their coiled-coil domains and interact preferentially via CTD interaction domains (CIDs) with CTD repeats phosphorylated at S2 and S7. Our high resolution crystal structures of the RPRD1A, RPRD1B and RPRD2 CIDs, alone and in complex with CTD phosphoisoforms, elucidate the molecular basis of CTD recognition. In an interesting example of cross-talk between different CTD modifications, our data also indicate that RPRD1A and RPRD1B associate directly with RPAP2 phosphatase and, by interacting with CTD repeats where phospho-S2 and/or phospho-S7 bracket a phospho-S5 residue, serve as CTD scaffolds to coordinate the dephosphorylation of phospho-S5 by RPAP2.
Efforts to map the Escherichia coli interactome have identified several hundred macromolecular complexes, but direct binary protein-protein interactions (PPIs) have not been surveyed on a large scale. Here we performed yeast two-hybrid screens of 3,305 baits against 3,606 preys (~70% of the E. coli proteome) in duplicate to generate a map of 2,234 interactions, approximately doubling the number of known binary PPIs in E. coli. Integration of binary PPIs and genetic interactions revealed functional dependencies among components involved in cellular processes, including envelope integrity, flagellum assembly and protein quality control. Many of the binary interactions that could be mapped within multi-protein complexes were informative regarding internal topology and indicated that interactions within complexes are significantly more conserved than those interactions connecting different complexes. This resource will be useful for inferring bacterial gene function and provides a draft reference of the basic physical wiring network of this evolutionarily significant model microbe.
Protein-protein interactions (PPIs) and multi-protein complexes perform central roles in the cellular systems of all living organisms. In humans, disruptions of the normal patterns of PPIs and protein complexes can be causative or indicative of a disease state. Recent developments in the biological applications of mass spectrometry (MS)-based proteomics have expanded the horizon for the application of systematic large-scale mapping of physical interactions to probe disease mechanisms. In this review, we examine the application of MS-based approaches for the experimental analysis of PPI networks and protein complexes, focusing on the different model systems (including human cells) used to study the molecular basis of common diseases such as cancer, cardiomyopathies, diabetes, microbial infections, and genetic and neurodegenerative disorders.
Large-scale proteomic analyses in Escherichia coli have documented the composition and physical relationships of multiprotein complexes, but not their functional organization into biological pathways and processes. Conversely, genetic interaction (GI) screens can provide insights into the biological role(s) of individual gene and higher order associations. Combining the information from both approaches should elucidate how complexes and pathways intersect functionally at a systems level. However, such integrative analysis has been hindered due to the lack of relevant GI data. Here we present a systematic, unbiased, and quantitative synthetic genetic array screen in E. coli describing the genetic dependencies and functional cross-talk among over 600,000 digenic mutant combinations. Combining this epistasis information with putative functional modules derived from previous proteomic data and genomic context-based methods revealed unexpected associations, including new components required for the biogenesis of iron-sulphur and ribosome integrity, and the interplay between molecular chaperones and proteases. We find that functionally-linked genes co-conserved among γ-proteobacteria are far more likely to have correlated GI profiles than genes with divergent patterns of evolution. Overall, examining bacterial GIs in the context of protein complexes provides avenues for a deeper mechanistic understanding of core microbial systems.
Genome-wide genetic interaction (GI) screens have been performed in yeast, but no analogous large-scale studies have yet been reported for bacteria. Here, we have used E. coli synthetic genetic array (eSGA) technology developed by our group to quantitatively map GIs to reveal epistatic dependencies and functional cross-talk among ∼600,000 digenic mutant combinations. By combining this epistasis information with functional modules derived by our group's earlier efforts from proteomic and genomic context (GC)-based methods, we identify several unexpected pathway-level dependencies, functional links between protein complexes, and biological roles of uncharacterized bacterial gene products. As part of the study, two of our pathway predictions from GI screens were validated experimentally, where we confirmed the role of these new components in iron-sulphur biogenesis and ribosome integrity. We also extrapolated the epistatic connectivity diagram of E. coli to 233 distantly related γ-proteobacterial species lacking GI information, and identified co-conserved genes and functional modules important for bacterial pathogenesis. Overall, this study describes the first genome-scale map of GIs in gram-negative bacterium, and through integrative analysis with previously derived protein-protein and GC-based interaction networks presents a number of novel insights into the architecture of bacterial pathways that could not have been discerned through either network alone.
MoxR ATPases are widespread throughout bacteria and archaea. The experimental evidence to date suggests that these proteins have chaperone-like roles in facilitating the maturation of dedicated protein complexes that are functionally diverse. In Escherichia coli, the MoxR ATPase RavA and its putative cofactor ViaA are found to exist in early stationary-phase cells at 37°C at low levels of about 350 and 90 molecules per cell, respectively. Both proteins are predominantly localized to the cytoplasm, but ViaA was also unexpectedly found to localize to the cell membrane. Whole genome microarrays and synthetic lethality studies both indicated that RavA-ViaA are genetically linked to Fe-S cluster assembly and specific respiratory pathways. Systematic analysis of mutant strains of ravA and viaA indicated that RavA-ViaA sensitizes cells to sublethal concentrations of aminoglycosides. Furthermore, this effect was dependent on RavA's ATPase activity, and on the presence of specific subunits of NADH:ubiquinone oxidoreductase I (Nuo Complex, or Complex I). Importantly, both RavA and ViaA were found to physically interact with specific Nuo subunits. We propose that RavA-ViaA facilitate the maturation of the Nuo complex.
Protein kinase signaling regulates human hematopoietic stem/progenitor cell (HSPC) fate, yet little is known about critical pathway substrates. To address this, we have developed and applied a large-scale, empirically-optimized phosphopeptide affinity enrichment strategy with high-throughput 2D LC-MS/MS screening to evaluate the phosphoproteome of an isolated human CD34+ HSPC population. We first used hydrophilic interaction chromatography (HILIC) as a first dimension separation to separate and simplify protein digest mixtures into discrete fractions. Phosphopeptides were then enriched offline using TiO2-coated magnetic beads and subsequently detected online by C18 reverse phase nanoflow HPLC using data-dependent MS/MS High-Energy Collision-activated Dissociation (HCD) fragmentation on a high performance Orbitrap hybrid tandem mass spectrometer. We identified 15533 unique phosphopeptides in 3574 putative phosphoproteins. Systematic computational analysis revealed biological pathways and phosphopeptides motifs enriched in CD34+ HSPC that are markedly different from those observed in an analogous parallel analysis of isolated human T cells, pointing to the possible involvement of specific kinase-substrate relationships within activated cascades driving hematopoietic renewal, commitment and differentiation.
human; hematopoietic; stem cell; signaling; phosphoprotein; phosphopeptide; chromatography; enrichment; tandem mass spectrometry
This study defines a network of synthetic sick/lethal interactions with a set of query genes in a series of isogenic cancer cell lines. Analysis of differential essentiality reveals general properties in genetic interaction networks derived from studies on model organisms.
This study defined about 200 negative genetic interactions in the isogenic cancer cell line background.Mapping of negative genetic interactions in a systematic fashion in isogenic cancer cell lines has revealed novel functions for several uncharacterized genes.This study demonstrates that differential essentiality profiles derived from isogenic cancer cell lines can be used to classify genetic dependencies in non-isogenic cancer cell lines.
Improved efforts are necessary to define the functional product of cancer mutations currently being revealed through large-scale sequencing efforts. Using genome-scale pooled shRNA screening technology, we mapped negative genetic interactions across a set of isogenic cancer cell lines and confirmed hundreds of these interactions in orthogonal co-culture competition assays to generate a high-confidence genetic interaction network of differentially essential or differential essentiality (DiE) genes. The network uncovered examples of conserved genetic interactions, densely connected functional modules derived from comparative genomics with model systems data, functions for uncharacterized genes in the human genome and targetable vulnerabilities. Finally, we demonstrate a general applicability of DiE gene signatures in determining genetic dependencies of other non-isogenic cancer cell lines. For example, the PTEN−/− DiE genes reveal a signature that can preferentially classify PTEN-dependent genotypes across a series of non-isogenic cell lines derived from the breast, pancreas and ovarian cancers. Our reference network suggests that many cancer vulnerabilities remain to be discovered through systematic derivation of a network of differentially essential genes in an isogenic cancer cell model.
genetic interaction; genome stability; mitotic stress; pooled shRNA screening
The yeast HECT-family E3 ubiquitin ligase Rsp5 has been implicated in diverse cell functions. Previously, we and others ,  reported the physical and functional interaction of Rsp5 with the deubiquitinating enzyme Ubp2, and the ubiquitin associated (UBA) domain-containing cofactor Rup1. To investigate the mechanism and significance of the Rsp5-Rup1-Ubp2 complex, we examined Rsp5 ubiquitination status in the presence or absence of these cofactors. We found that, similar to its mammalian homologues, Rsp5 is auto-ubiquitinated in vivo. Association with a substrate or Rup1 increased Rsp5 self-ubiquitination, whereas Ubp2 efficiently deubiquitinates Rsp5 in vivo and in vitro. The data reported here imply an auto-modulatory mechanism of Rsp5 regulation common to other E3 ligases.
We describe the discovery of UNC1215, a potent and selective chemical probe for the methyl-lysine (Kme) reading function of L3MBTL3, a member of the malignant brain tumor (MBT) family of chromatin interacting transcriptional repressors. UNC1215 binds L3MBTL3 with a Kd of 120 nM, competitively displacing mono- or dimethyl-lysine containing peptides, and is greater than 50-fold selective versus other members of the MBT family while also demonstrating selectivity against more than 200 other reader domains examined. X-ray crystallography identified a novel 2:2 polyvalent mode of interaction. In cells, UNC1215 is non-toxic and binds directly to L3MBTL3 via the Kme-binding pocket of the MBT domains. UNC1215 increases the cellular mobility of GFP-L3MBTL3 fusion proteins and point mutants that disrupt the Kme binding function of GFP-L3MBTL3 phenocopy the effects of UNC1215. Finally, UNC1215 demonstrates a novel Kme-dependent interaction of L3MBTL3 with BCLAF1, a protein implicated in DNA damage repair and apoptosis.
Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions which were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably-associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes, and encompass both candidate disease genes and unnanotated proteins to inform on mechanism. Strikingly, whereas larger multi-protein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with 5 or fewer subunits are far more likely to be functionally un-annotated or restricted to vertebrates, suggesting more recent functional innovations.
Cardiomyopathies are diseases of the heart resulting in impaired cardiac muscle function, which can lead to heart dilation or overt heart failure. These diseases represent a major cause of global morbidity and death. Innovative preventive and therapeutic measures are urgently needed for early detection, categorization, and treatment of patients at risk of cardiomyopathy. These developments will require a more complete understanding of the molecular effects of impaired cardiac function, even prior to overt disease. The use of gel-free expression proteomics in the detailed analysis of cardiac tissues should yield significant insight into the pathophysiology of these diseases.
PMID: 17172675 CAMSID: cams3063
Cardiac muscle; multidimensional protein identification technology (MudPIT); mass spectrometry
While phosphotyrosine modification is an established regulatory mechanism in eukaryotes, it is less well characterized in bacteria due to low prevalence. To gain insight into the extent and biological importance of tyrosine phosphorylation in Escherichia coli, we used immunoaffinity-based phosphotyrosine peptide enrichment combined with high resolution mass spectrometry analysis to comprehensively identify tyrosine phosphorylated proteins and accurately map phosphotyrosine sites. We identified a total of 512 unique phosphotyrosine sites on 342 proteins in E. coli K12 and the human pathogen enterohemorrhagic E. coli (EHEC) O157:H7, representing the largest phosphotyrosine proteome reported to date in bacteria. This large number of tyrosine phosphorylation sites allowed us to define five phosphotyrosine site motifs. Tyrosine phosphorylated proteins belong to various functional classes such as metabolism, gene expression and virulence. We demonstrate for the first time that proteins of a type III secretion system (T3SS), required for the attaching and effacing (A/E) lesion phenotype characteristic for intestinal colonization by certain EHEC strains, are tyrosine phosphorylated by bacterial kinases. Yet, A/E lesion and metabolic phenotypes were unaffected by the mutation of the two currently known tyrosine kinases, Etk and Wzc. Substantial residual tyrosine phosphorylation present in an etk wzc double mutant strongly indicated the presence of hitherto unknown tyrosine kinases in E. coli. We assess the functional importance of tyrosine phosphorylation and demonstrate that the phosphorylated tyrosine residue of the regulator SspA positively affects expression and secretion of T3SS proteins and formation of A/E lesions. Altogether, our study reveals that tyrosine phosphorylation in bacteria is more prevalent than previously recognized, and suggests the involvement of phosphotyrosine-mediated signaling in a broad range of cellular functions and virulence.
While phosphotyrosine modification is established in eukaryote cell signaling, it is less characterized in bacteria. Despite that deletion of bacterial tyrosine kinases is known to affect various cellular functions and virulence of bacterial pathogens, few phosphotyrosine proteins are currently known. To gain insight into the extent and biological function of tyrosine phosphorylation in E. coli, we carried out an in-depth phosphotyrosine protein profiling using a mass spectrometry-based proteomics approach. Our study on E. coli K12 and the human pathogen enterohemorrhagic E. coli (EHEC) O157:H7, which is a common cause of food-borne outbreaks of diarrhea, hemorrhagic colitis and hemolytic uremic syndrome, reveal that tyrosine phosphorylation is far more prevalent than previously recognized. Target proteins are involved in a broad range of cellular functions and virulence. Proteins of the type III secretion system (T3SS), required for the attaching and effacing lesion phenotype characteristic for intestinal colonization by EHEC, are tyrosine phosphorylated. The expression of these T3SS proteins and A/E lesion formation is affected by a tyrosine phosphorylated residue on the regulator SspA. Also, our data indicates the presence of hitherto unknown E. coli tyrosine kinases. Overall, tyrosine phosphorylation seems to be involved in controlling cellular core processes and virulence of bacteria.
The Hsp70–Hsp110 chaperone complex antagonizes Cin8 plus-end motility and
prevents premature spindle elongation in S phase.
Systematic affinity purification combined with mass spectrometry analysis of N-
and C-tagged cytoplasmic Hsp70/Hsp110 chaperones was used to identify new roles
of Hsp70/Hsp110 in the cell. This allowed the mapping of a
chaperone–protein network consisting of 1,227 unique interactions between
the 9 chaperones and 473 proteins and highlighted roles for Hsp70/Hsp110 in 14
broad biological processes. Using this information, we uncovered an essential
role for Hsp110 in spindle assembly and, more specifically, in modulating the
activity of the widely conserved kinesin-5 motor Cin8. The role of Hsp110 Sse1
as a nucleotide exchange factor for the Hsp70 chaperones Ssa1/Ssa2 was found to
be required for maintaining the proper distribution of kinesin-5 motors within
the spindle, which was subsequently required for bipolar spindle assembly in S
phase. These data suggest a model whereby the Hsp70–Hsp110 chaperone
complex antagonizes Cin8 plus-end motility and prevents premature spindle
elongation in S phase.
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and the associated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes. Cas1 is a CRISPR-associated protein that is common to all CRISPR-containing prokaryotes but its function remains obscure. Here we show that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions, replication forks, and 5′-flaps. The crystal structure of YgbT and site-directed mutagenesis have revealed the potential active site. Genome-wide screens show that YgbT physically and genetically interacts with key components of DNA repair systems, including recB, recC and ruvB. Consistent with these findings, the ygbT deletion strain showed increased sensitivity to DNA damage and impaired chromosomal segregation. Similar phenotypes were observed in strains with deletion of CRISPR clusters, suggesting that the function of YgbT in repair involves interaction with the CRISPRs. These results show that YgbT belongs to a novel, structurally distinct family of nucleases acting on branched DNAs and suggest that, in addition to antiviral immunity, at least some components of the CRISPR-Cas system have a function in DNA repair.
Cas1; CRISPR; DNA recombination; DNA repair; nuclease; YgbT
As the interface between a microbe and its environment, the bacterial cell envelope has broad biological and clinical significance. While numerous biosynthesis genes and pathways have been identified and studied in isolation, how these intersect functionally to ensure envelope integrity during adaptive responses to environmental challenge remains unclear. To this end, we performed high-density synthetic genetic screens to generate quantitative functional association maps encompassing virtually the entire cell envelope biosynthetic machinery of Escherichia coli under both auxotrophic (rich medium) and prototrophic (minimal medium) culture conditions. The differential patterns of genetic interactions detected among >235,000 digenic mutant combinations tested reveal unexpected condition-specific functional crosstalk and genetic backup mechanisms that ensure stress-resistant envelope assembly and maintenance. These networks also provide insights into the global systems connectivity and dynamic functional reorganization of a universal bacterial structure that is both broadly conserved among eubacteria (including pathogens) and an important target.
Proper assembly of the cell envelope is essential for bacterial growth, environmental adaptation, and drug resistance. Yet, while the biological roles of the many genes and pathways involved in biosynthesis of the cell envelope have been studied extensively in isolation, how the myriad components intersect functionally to maintain envelope integrity under different growth conditions has not been explored systematically. Genome-scale genetic interaction screens have increasingly been performed to great impact in yeast; no analogous comprehensive studies have yet been reported for bacteria despite their prominence in human health and disease. We addressed this by using a synthetic genetic array technology to generate quantitative maps of genetic interactions encompassing virtually all the components of the cell envelope biosynthetic machinery of the classic model bacterium E. coli in two common laboratory growth conditions (rich and minimal medium). From the resulting networks of high-confidence genetic interactions, we identify condition-specific functional dependencies underlying envelope assembly and global remodeling of genetic backup mechanisms that ensure envelope integrity under environmental challenge.
RNA polymerase II (RNAP II) C-terminal domain (CTD) phosphorylation is important for various transcription-related processes. Here, we identify by affinity purification and mass spectrometry three previously uncharacterized human CTD-interaction domain (CID)-containing proteins, RPRD1A, RPRD1B and RPRD2, which co-purify with RNAP II and three other RNAP II-associated proteins, RPAP2, GRINL1A and RECQL5, but not with the Mediator complex. RPRD1A and RPRD1B can accompany RNAP II from promoter regions to 3′-untranslated regions during transcription in vivo, predominantly interact with phosphorylated RNAP II, and can reduce CTD S5- and S7-phosphorylated RNAP II at target gene promoters. Thus, the RPRD proteins are likely to have multiple important roles in transcription.
RPRD1A; RPRD1B; CID; CTD; RNA polymerase II
Elongation factor RbbA is required for ATP-dependent deacyl-tRNA release presumably after each peptide bond formation; however, there is no information about the cellular role. Proteomic analysis in Escherichia coli revealed that RbbA reciprocally co-purified with a conserved inner membrane protein of unknown function, YhjD. Both proteins are also physically associated with the 30S ribosome and with members of the lipopolysaccharide transport machinery. Genome-wide genetic screens of rbbA and yhjD deletion mutants revealed aggravating genetic interactions with mutants deficient in the electron transport chain. Cells lacking both rbbA and yhjD exhibited reduced cell division, respiration and global protein synthesis as well as increased sensitivity to antibiotics targeting the ETC and the accuracy of protein synthesis. Our results suggest that RbbA appears to function together with YhjD as part of a regulatory network that impacts bacterial oxidative phosphorylation and translation efficiency.
Motivation: A post-translational modification (PTM) is a chemical modification of a protein that occurs naturally. Many of these modifications, such as phosphorylation, are known to play pivotal roles in the regulation of protein function. Henceforth, PTM perturbations have been linked to diverse diseases like Parkinson's, Alzheimer's, diabetes and cancer. To discover PTMs on a genome-wide scale, there is a recent surge of interest in analyzing tandem mass spectrometry data, and several unrestrictive (so-called ‘blind’) PTM search methods have been reported. However, these approaches are subject to noise in mass measurements and in the predicted modification site (amino acid position) within peptides, which can result in false PTM assignments.
Results: To address these issues, we devised a machine learning algorithm, PTMClust, that can be applied to the output of blind PTM search methods to improve prediction quality, by suppressing noise in the data and clustering peptides with the same underlying modification to form PTM groups. We show that our technique outperforms two standard clustering algorithms on a simulated dataset. Additionally, we show that our algorithm significantly improves sensitivity and specificity when applied to the output of three different blind PTM search engines, SIMS, InsPecT and MODmap. Additionally, PTMClust markedly outperforms another PTM refinement algorithm, PTMFinder. We demonstrate that our technique is able to reduce false PTM assignments, improve overall detection coverage and facilitate novel PTM discovery, including terminus modifications. We applied our technique to a large-scale yeast MS/MS proteome profiling dataset and found numerous known and novel PTMs. Accurately identifying modifications in protein sequences is a critical first step for PTM profiling, and thus our approach may benefit routine proteomic analysis.
Availability: Our algorithm is implemented in Matlab and is freely available for academic use. The software is available online from http://genes.toronto.edu.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Chromatin modification (CM) plays a key role in regulating transcription, DNA replication, repair and recombination. However, our knowledge of these processes in humans remains very limited. Here we use computational approaches to study proteins and functional domains involved in CM in humans. We analyze the abundance and the pair-wise domain-domain co-occurrences of 25 well-documented CM domains in 5 model organisms: yeast, worm, fly, mouse and human. Results show that domains involved in histone methylation, DNA methylation, and histone variants are remarkably expanded in metazoan, reflecting the increased demand for cell type-specific gene regulation. We find that CM domains tend to co-occur with a limited number of partner domains and are hence not promiscuous. This property is exploited to identify 47 potentially novel CM domains, including 24 DNA-binding domains, whose role in CM has received little attention so far. Lastly, we use a consensus Machine Learning approach to predict 379 novel CM genes (coding for 329 proteins) in humans based on domain compositions. Several of these predictions are supported by very recent experimental studies and others are slated for experimental verification. Identification of novel CM genes and domains in humans will aid our understanding of fundamental epigenetic processes that are important for stem cell differentiation and cancer biology. Information on all the candidate CM domains and genes reported here is publicly available.
Gene-set enrichment analysis is a useful technique to help functionally characterize large gene lists, such as the results of gene expression experiments. This technique finds functionally coherent gene-sets, such as pathways, that are statistically over-represented in a given gene list. Ideally, the number of resulting sets is smaller than the number of genes in the list, thus simplifying interpretation. However, the increasing number and redundancy of gene-sets used by many current enrichment analysis software works against this ideal.
To overcome gene-set redundancy and help in the interpretation of large gene lists, we developed “Enrichment Map”, a network-based visualization method for gene-set enrichment results. Gene-sets are organized in a network, where each set is a node and edges represent gene overlap between sets. Automated network layout groups related gene-sets into network clusters, enabling the user to quickly identify the major enriched functional themes and more easily interpret the enrichment results.
Enrichment Map is a significant advance in the interpretation of enrichment analysis. Any research project that generates a list of genes can take advantage of this visualization framework. Enrichment Map is implemented as a freely available and user friendly plug-in for the Cytoscape network visualization software (http://baderlab.org/Software/EnrichmentMap/).