MoxR ATPases are widespread throughout bacteria and archaea. The experimental evidence to date suggests that these proteins have chaperone-like roles in facilitating the maturation of dedicated protein complexes that are functionally diverse. In Escherichia coli, the MoxR ATPase RavA and its putative cofactor ViaA are found to exist in early stationary-phase cells at 37°C at low levels of about 350 and 90 molecules per cell, respectively. Both proteins are predominantly localized to the cytoplasm, but ViaA was also unexpectedly found to localize to the cell membrane. Whole genome microarrays and synthetic lethality studies both indicated that RavA-ViaA are genetically linked to Fe-S cluster assembly and specific respiratory pathways. Systematic analysis of mutant strains of ravA and viaA indicated that RavA-ViaA sensitizes cells to sublethal concentrations of aminoglycosides. Furthermore, this effect was dependent on RavA's ATPase activity, and on the presence of specific subunits of NADH:ubiquinone oxidoreductase I (Nuo Complex, or Complex I). Importantly, both RavA and ViaA were found to physically interact with specific Nuo subunits. We propose that RavA-ViaA facilitate the maturation of the Nuo complex.
Protein kinase signaling regulates human hematopoietic stem/progenitor cell (HSPC) fate, yet little is known about critical pathway substrates. To address this, we have developed and applied a large-scale, empirically-optimized phosphopeptide affinity enrichment strategy with high-throughput 2D LC-MS/MS screening to evaluate the phosphoproteome of an isolated human CD34+ HSPC population. We first used hydrophilic interaction chromatography (HILIC) as a first dimension separation to separate and simplify protein digest mixtures into discrete fractions. Phosphopeptides were then enriched offline using TiO2-coated magnetic beads and subsequently detected online by C18 reverse phase nanoflow HPLC using data-dependent MS/MS High-Energy Collision-activated Dissociation (HCD) fragmentation on a high performance Orbitrap hybrid tandem mass spectrometer. We identified 15533 unique phosphopeptides in 3574 putative phosphoproteins. Systematic computational analysis revealed biological pathways and phosphopeptides motifs enriched in CD34+ HSPC that are markedly different from those observed in an analogous parallel analysis of isolated human T cells, pointing to the possible involvement of specific kinase-substrate relationships within activated cascades driving hematopoietic renewal, commitment and differentiation.
human; hematopoietic; stem cell; signaling; phosphoprotein; phosphopeptide; chromatography; enrichment; tandem mass spectrometry
This study defines a network of synthetic sick/lethal interactions with a set of query genes in a series of isogenic cancer cell lines. Analysis of differential essentiality reveals general properties in genetic interaction networks derived from studies on model organisms.
This study defined about 200 negative genetic interactions in the isogenic cancer cell line background.Mapping of negative genetic interactions in a systematic fashion in isogenic cancer cell lines has revealed novel functions for several uncharacterized genes.This study demonstrates that differential essentiality profiles derived from isogenic cancer cell lines can be used to classify genetic dependencies in non-isogenic cancer cell lines.
Improved efforts are necessary to define the functional product of cancer mutations currently being revealed through large-scale sequencing efforts. Using genome-scale pooled shRNA screening technology, we mapped negative genetic interactions across a set of isogenic cancer cell lines and confirmed hundreds of these interactions in orthogonal co-culture competition assays to generate a high-confidence genetic interaction network of differentially essential or differential essentiality (DiE) genes. The network uncovered examples of conserved genetic interactions, densely connected functional modules derived from comparative genomics with model systems data, functions for uncharacterized genes in the human genome and targetable vulnerabilities. Finally, we demonstrate a general applicability of DiE gene signatures in determining genetic dependencies of other non-isogenic cancer cell lines. For example, the PTEN−/− DiE genes reveal a signature that can preferentially classify PTEN-dependent genotypes across a series of non-isogenic cell lines derived from the breast, pancreas and ovarian cancers. Our reference network suggests that many cancer vulnerabilities remain to be discovered through systematic derivation of a network of differentially essential genes in an isogenic cancer cell model.
genetic interaction; genome stability; mitotic stress; pooled shRNA screening
The yeast HECT-family E3 ubiquitin ligase Rsp5 has been implicated in diverse cell functions. Previously, we and others ,  reported the physical and functional interaction of Rsp5 with the deubiquitinating enzyme Ubp2, and the ubiquitin associated (UBA) domain-containing cofactor Rup1. To investigate the mechanism and significance of the Rsp5-Rup1-Ubp2 complex, we examined Rsp5 ubiquitination status in the presence or absence of these cofactors. We found that, similar to its mammalian homologues, Rsp5 is auto-ubiquitinated in vivo. Association with a substrate or Rup1 increased Rsp5 self-ubiquitination, whereas Ubp2 efficiently deubiquitinates Rsp5 in vivo and in vitro. The data reported here imply an auto-modulatory mechanism of Rsp5 regulation common to other E3 ligases.
We describe the discovery of UNC1215, a potent and selective chemical probe for the methyl-lysine (Kme) reading function of L3MBTL3, a member of the malignant brain tumor (MBT) family of chromatin interacting transcriptional repressors. UNC1215 binds L3MBTL3 with a Kd of 120 nM, competitively displacing mono- or dimethyl-lysine containing peptides, and is greater than 50-fold selective versus other members of the MBT family while also demonstrating selectivity against more than 200 other reader domains examined. X-ray crystallography identified a novel 2:2 polyvalent mode of interaction. In cells, UNC1215 is non-toxic and binds directly to L3MBTL3 via the Kme-binding pocket of the MBT domains. UNC1215 increases the cellular mobility of GFP-L3MBTL3 fusion proteins and point mutants that disrupt the Kme binding function of GFP-L3MBTL3 phenocopy the effects of UNC1215. Finally, UNC1215 demonstrates a novel Kme-dependent interaction of L3MBTL3 with BCLAF1, a protein implicated in DNA damage repair and apoptosis.
Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions which were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably-associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes, and encompass both candidate disease genes and unnanotated proteins to inform on mechanism. Strikingly, whereas larger multi-protein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with 5 or fewer subunits are far more likely to be functionally un-annotated or restricted to vertebrates, suggesting more recent functional innovations.
Cardiomyopathies are diseases of the heart resulting in impaired cardiac muscle function, which can lead to heart dilation or overt heart failure. These diseases represent a major cause of global morbidity and death. Innovative preventive and therapeutic measures are urgently needed for early detection, categorization, and treatment of patients at risk of cardiomyopathy. These developments will require a more complete understanding of the molecular effects of impaired cardiac function, even prior to overt disease. The use of gel-free expression proteomics in the detailed analysis of cardiac tissues should yield significant insight into the pathophysiology of these diseases.
PMID: 17172675 CAMSID: cams3063
Cardiac muscle; multidimensional protein identification technology (MudPIT); mass spectrometry
While phosphotyrosine modification is an established regulatory mechanism in eukaryotes, it is less well characterized in bacteria due to low prevalence. To gain insight into the extent and biological importance of tyrosine phosphorylation in Escherichia coli, we used immunoaffinity-based phosphotyrosine peptide enrichment combined with high resolution mass spectrometry analysis to comprehensively identify tyrosine phosphorylated proteins and accurately map phosphotyrosine sites. We identified a total of 512 unique phosphotyrosine sites on 342 proteins in E. coli K12 and the human pathogen enterohemorrhagic E. coli (EHEC) O157:H7, representing the largest phosphotyrosine proteome reported to date in bacteria. This large number of tyrosine phosphorylation sites allowed us to define five phosphotyrosine site motifs. Tyrosine phosphorylated proteins belong to various functional classes such as metabolism, gene expression and virulence. We demonstrate for the first time that proteins of a type III secretion system (T3SS), required for the attaching and effacing (A/E) lesion phenotype characteristic for intestinal colonization by certain EHEC strains, are tyrosine phosphorylated by bacterial kinases. Yet, A/E lesion and metabolic phenotypes were unaffected by the mutation of the two currently known tyrosine kinases, Etk and Wzc. Substantial residual tyrosine phosphorylation present in an etk wzc double mutant strongly indicated the presence of hitherto unknown tyrosine kinases in E. coli. We assess the functional importance of tyrosine phosphorylation and demonstrate that the phosphorylated tyrosine residue of the regulator SspA positively affects expression and secretion of T3SS proteins and formation of A/E lesions. Altogether, our study reveals that tyrosine phosphorylation in bacteria is more prevalent than previously recognized, and suggests the involvement of phosphotyrosine-mediated signaling in a broad range of cellular functions and virulence.
While phosphotyrosine modification is established in eukaryote cell signaling, it is less characterized in bacteria. Despite that deletion of bacterial tyrosine kinases is known to affect various cellular functions and virulence of bacterial pathogens, few phosphotyrosine proteins are currently known. To gain insight into the extent and biological function of tyrosine phosphorylation in E. coli, we carried out an in-depth phosphotyrosine protein profiling using a mass spectrometry-based proteomics approach. Our study on E. coli K12 and the human pathogen enterohemorrhagic E. coli (EHEC) O157:H7, which is a common cause of food-borne outbreaks of diarrhea, hemorrhagic colitis and hemolytic uremic syndrome, reveal that tyrosine phosphorylation is far more prevalent than previously recognized. Target proteins are involved in a broad range of cellular functions and virulence. Proteins of the type III secretion system (T3SS), required for the attaching and effacing lesion phenotype characteristic for intestinal colonization by EHEC, are tyrosine phosphorylated. The expression of these T3SS proteins and A/E lesion formation is affected by a tyrosine phosphorylated residue on the regulator SspA. Also, our data indicates the presence of hitherto unknown E. coli tyrosine kinases. Overall, tyrosine phosphorylation seems to be involved in controlling cellular core processes and virulence of bacteria.
The Hsp70–Hsp110 chaperone complex antagonizes Cin8 plus-end motility and
prevents premature spindle elongation in S phase.
Systematic affinity purification combined with mass spectrometry analysis of N-
and C-tagged cytoplasmic Hsp70/Hsp110 chaperones was used to identify new roles
of Hsp70/Hsp110 in the cell. This allowed the mapping of a
chaperone–protein network consisting of 1,227 unique interactions between
the 9 chaperones and 473 proteins and highlighted roles for Hsp70/Hsp110 in 14
broad biological processes. Using this information, we uncovered an essential
role for Hsp110 in spindle assembly and, more specifically, in modulating the
activity of the widely conserved kinesin-5 motor Cin8. The role of Hsp110 Sse1
as a nucleotide exchange factor for the Hsp70 chaperones Ssa1/Ssa2 was found to
be required for maintaining the proper distribution of kinesin-5 motors within
the spindle, which was subsequently required for bipolar spindle assembly in S
phase. These data suggest a model whereby the Hsp70–Hsp110 chaperone
complex antagonizes Cin8 plus-end motility and prevents premature spindle
elongation in S phase.
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and the associated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes. Cas1 is a CRISPR-associated protein that is common to all CRISPR-containing prokaryotes but its function remains obscure. Here we show that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions, replication forks, and 5′-flaps. The crystal structure of YgbT and site-directed mutagenesis have revealed the potential active site. Genome-wide screens show that YgbT physically and genetically interacts with key components of DNA repair systems, including recB, recC and ruvB. Consistent with these findings, the ygbT deletion strain showed increased sensitivity to DNA damage and impaired chromosomal segregation. Similar phenotypes were observed in strains with deletion of CRISPR clusters, suggesting that the function of YgbT in repair involves interaction with the CRISPRs. These results show that YgbT belongs to a novel, structurally distinct family of nucleases acting on branched DNAs and suggest that, in addition to antiviral immunity, at least some components of the CRISPR-Cas system have a function in DNA repair.
Cas1; CRISPR; DNA recombination; DNA repair; nuclease; YgbT
As the interface between a microbe and its environment, the bacterial cell envelope has broad biological and clinical significance. While numerous biosynthesis genes and pathways have been identified and studied in isolation, how these intersect functionally to ensure envelope integrity during adaptive responses to environmental challenge remains unclear. To this end, we performed high-density synthetic genetic screens to generate quantitative functional association maps encompassing virtually the entire cell envelope biosynthetic machinery of Escherichia coli under both auxotrophic (rich medium) and prototrophic (minimal medium) culture conditions. The differential patterns of genetic interactions detected among >235,000 digenic mutant combinations tested reveal unexpected condition-specific functional crosstalk and genetic backup mechanisms that ensure stress-resistant envelope assembly and maintenance. These networks also provide insights into the global systems connectivity and dynamic functional reorganization of a universal bacterial structure that is both broadly conserved among eubacteria (including pathogens) and an important target.
Proper assembly of the cell envelope is essential for bacterial growth, environmental adaptation, and drug resistance. Yet, while the biological roles of the many genes and pathways involved in biosynthesis of the cell envelope have been studied extensively in isolation, how the myriad components intersect functionally to maintain envelope integrity under different growth conditions has not been explored systematically. Genome-scale genetic interaction screens have increasingly been performed to great impact in yeast; no analogous comprehensive studies have yet been reported for bacteria despite their prominence in human health and disease. We addressed this by using a synthetic genetic array technology to generate quantitative maps of genetic interactions encompassing virtually all the components of the cell envelope biosynthetic machinery of the classic model bacterium E. coli in two common laboratory growth conditions (rich and minimal medium). From the resulting networks of high-confidence genetic interactions, we identify condition-specific functional dependencies underlying envelope assembly and global remodeling of genetic backup mechanisms that ensure envelope integrity under environmental challenge.
RNA polymerase II (RNAP II) C-terminal domain (CTD) phosphorylation is important for various transcription-related processes. Here, we identify by affinity purification and mass spectrometry three previously uncharacterized human CTD-interaction domain (CID)-containing proteins, RPRD1A, RPRD1B and RPRD2, which co-purify with RNAP II and three other RNAP II-associated proteins, RPAP2, GRINL1A and RECQL5, but not with the Mediator complex. RPRD1A and RPRD1B can accompany RNAP II from promoter regions to 3′-untranslated regions during transcription in vivo, predominantly interact with phosphorylated RNAP II, and can reduce CTD S5- and S7-phosphorylated RNAP II at target gene promoters. Thus, the RPRD proteins are likely to have multiple important roles in transcription.
RPRD1A; RPRD1B; CID; CTD; RNA polymerase II
Elongation factor RbbA is required for ATP-dependent deacyl-tRNA release presumably after each peptide bond formation; however, there is no information about the cellular role. Proteomic analysis in Escherichia coli revealed that RbbA reciprocally co-purified with a conserved inner membrane protein of unknown function, YhjD. Both proteins are also physically associated with the 30S ribosome and with members of the lipopolysaccharide transport machinery. Genome-wide genetic screens of rbbA and yhjD deletion mutants revealed aggravating genetic interactions with mutants deficient in the electron transport chain. Cells lacking both rbbA and yhjD exhibited reduced cell division, respiration and global protein synthesis as well as increased sensitivity to antibiotics targeting the ETC and the accuracy of protein synthesis. Our results suggest that RbbA appears to function together with YhjD as part of a regulatory network that impacts bacterial oxidative phosphorylation and translation efficiency.
Motivation: A post-translational modification (PTM) is a chemical modification of a protein that occurs naturally. Many of these modifications, such as phosphorylation, are known to play pivotal roles in the regulation of protein function. Henceforth, PTM perturbations have been linked to diverse diseases like Parkinson's, Alzheimer's, diabetes and cancer. To discover PTMs on a genome-wide scale, there is a recent surge of interest in analyzing tandem mass spectrometry data, and several unrestrictive (so-called ‘blind’) PTM search methods have been reported. However, these approaches are subject to noise in mass measurements and in the predicted modification site (amino acid position) within peptides, which can result in false PTM assignments.
Results: To address these issues, we devised a machine learning algorithm, PTMClust, that can be applied to the output of blind PTM search methods to improve prediction quality, by suppressing noise in the data and clustering peptides with the same underlying modification to form PTM groups. We show that our technique outperforms two standard clustering algorithms on a simulated dataset. Additionally, we show that our algorithm significantly improves sensitivity and specificity when applied to the output of three different blind PTM search engines, SIMS, InsPecT and MODmap. Additionally, PTMClust markedly outperforms another PTM refinement algorithm, PTMFinder. We demonstrate that our technique is able to reduce false PTM assignments, improve overall detection coverage and facilitate novel PTM discovery, including terminus modifications. We applied our technique to a large-scale yeast MS/MS proteome profiling dataset and found numerous known and novel PTMs. Accurately identifying modifications in protein sequences is a critical first step for PTM profiling, and thus our approach may benefit routine proteomic analysis.
Availability: Our algorithm is implemented in Matlab and is freely available for academic use. The software is available online from http://genes.toronto.edu.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Chromatin modification (CM) plays a key role in regulating transcription, DNA replication, repair and recombination. However, our knowledge of these processes in humans remains very limited. Here we use computational approaches to study proteins and functional domains involved in CM in humans. We analyze the abundance and the pair-wise domain-domain co-occurrences of 25 well-documented CM domains in 5 model organisms: yeast, worm, fly, mouse and human. Results show that domains involved in histone methylation, DNA methylation, and histone variants are remarkably expanded in metazoan, reflecting the increased demand for cell type-specific gene regulation. We find that CM domains tend to co-occur with a limited number of partner domains and are hence not promiscuous. This property is exploited to identify 47 potentially novel CM domains, including 24 DNA-binding domains, whose role in CM has received little attention so far. Lastly, we use a consensus Machine Learning approach to predict 379 novel CM genes (coding for 329 proteins) in humans based on domain compositions. Several of these predictions are supported by very recent experimental studies and others are slated for experimental verification. Identification of novel CM genes and domains in humans will aid our understanding of fundamental epigenetic processes that are important for stem cell differentiation and cancer biology. Information on all the candidate CM domains and genes reported here is publicly available.
Gene-set enrichment analysis is a useful technique to help functionally characterize large gene lists, such as the results of gene expression experiments. This technique finds functionally coherent gene-sets, such as pathways, that are statistically over-represented in a given gene list. Ideally, the number of resulting sets is smaller than the number of genes in the list, thus simplifying interpretation. However, the increasing number and redundancy of gene-sets used by many current enrichment analysis software works against this ideal.
To overcome gene-set redundancy and help in the interpretation of large gene lists, we developed “Enrichment Map”, a network-based visualization method for gene-set enrichment results. Gene-sets are organized in a network, where each set is a node and edges represent gene overlap between sets. Automated network layout groups related gene-sets into network clusters, enabling the user to quickly identify the major enriched functional themes and more easily interpret the enrichment results.
Enrichment Map is a significant advance in the interpretation of enrichment analysis. Any research project that generates a list of genes can take advantage of this visualization framework. Enrichment Map is implemented as a freely available and user friendly plug-in for the Cytoscape network visualization software (http://baderlab.org/Software/EnrichmentMap/).
Chromatin modification (CM) is a set of epigenetic processes that govern many aspects of DNA replication, transcription and repair. CM is carried out by groups of physically interacting proteins, and their disruption has been linked to a number of complex human diseases. CM remains largely unexplored, however, especially in higher eukaryotes such as human. Here we present the DAnCER resource, which integrates information on genes with CM function from five model organisms, including human. Currently integrated are gene functional annotations, Pfam domain architecture, protein interaction networks and associated human diseases. Additional supporting evidence includes orthology relationships across organisms, membership in protein complexes, and information on protein 3D structure. These data are available for 962 experimentally confirmed and manually curated CM genes and for over 5000 genes with predicted CM function on the basis of orthology and domain composition. DAnCER allows visual explorations of the integrated data and flexible query capabilities using a variety of data filters. In particular, disease information and functional annotations are mapped onto the protein interaction networks, enabling the user to formulate new hypotheses on the function and disease associations of a given gene based on those of its interaction partners. DAnCER is freely available at http://wodaklab.org/dancer/.
Protein and mRNA copy numbers vary from cell to cell in isogenic bacterial populations. However, these molecules often exist in low copy numbers, and are difficult to detect in single cells. Here we carried out quantitative system-wide analyses of protein and mRNA expression in individual cells with single-molecule sensitivity using a newly constructed yellow fluorescent protein fusion library for Escherichia coli. We found that almost all protein number distributions can be described by the gamma distribution with two fitting parameters which, at low expression levels, have clear physical interpretations as the transcription rate and protein burst size. At high expression levels, the distributions are dominated by extrinsic noise. Strikingly, we found that a single cell's protein and mRNA copy numbers for any given gene are uncorrelated.
Histone variant H2A.Z has a conserved role in genome stability, although it remains unclear how this is mediated. Here we demonstrate in fission yeast that the Swr1 ATPase inserts H2A.Z (Pht1) into chromatin and Kat5 acetyltransferase (Mst1) acetylates it. Deletion or unacetylatable mutation of Pht1 leads to genome instability, primarily caused by chromosome entanglement/breakage at anaphase. This leads to the loss of telomere-proximal markers, though telomere protection and repeat length are unaffected by the absence of Pht1. Strikingly the chromosome entanglement in pht1Δ anaphase cells can be rescued by forcing chromosome condensation prior to anaphase onset. We show that the condensin complex, required for the maintenance of anaphase chromosome condensation, prematurely dissociates from chromatin in the absence of Pht1. This and other findings suggest an important role for H2A.Z in the architecture of anaphase chromosomes.
Chromosome architecture; condensin; H2A.Z; KAT5; RCA; S. pombe
Global protein expression profiling can potentially uncover perturbations associated with common forms of heart disease. We have used shotgun tandem mass spectrometry to monitor the state of biological systems in cardiac tissue correlating with disease onset, cardiac insufficiency and progression to heart failure in a time-course mouse model of dilated cardiomyopathy (DCM). However, interpreting the functional significance of the hundreds of differentially expressed proteins has been challenging. Here, we utilize improved enrichment statistical methods and an extensive collection of functionally related gene sets, gaining a more comprehensive understanding of the progressive alterations associated with functional decline in DCM. We visualize the enrichment results as an Enrichment Map, where significant gene sets are grouped based on annotation similarity. This approach vastly simplifies the interpretation of the large number of enriched gene-sets found. For pathways of specific interest, such as Apoptosis and the MAPK cascade, we performed a more detailed analysis of the underlying signaling network, including experimental validation of expression patterns.
Proteomics; Gene Expression; Mass Spectrometry; Quantitation; Cardiomyopathy; Pathway Analysis
The yjeE, yeaZ, and ygjD genes are highly conserved in the genomes of eubacteria, and ygjD orthologs are also found throughout the Archaea and eukaryotes. In this study, we have constructed conditional expression strains for each of these genes in the model organism Escherichia coli K12. We show that each gene is essential for the viability of E. coli under laboratory growth conditions. Growth of the conditional strains under nonpermissive conditions results in dramatic changes in cell ultrastructure. Deliberate repression of the expression of yeaZ results in cells with highly condensed nucleoids, while repression of yjeE and ygjD expression results in at least a proportion of very enlarged cells with an unusual peripheral distribution of DNA. Each of the three conditional expression strains can be complemented by multicopy clones harboring the rstA gene, which encodes a two-component-system response regulator, strongly suggesting that these proteins are involved in the same essential cellular pathway. The results of bacterial two-hybrid experiments show that YeaZ can interact with both YjeE and YgjD but that YgjD is the preferred interaction partner. The results of in vitro experiments indicate that YeaZ mediates the proteolysis of YgjD, suggesting that YeaZ and YjeE act as regulators to control the activity of this protein. Our results are consistent with these proteins forming a link between DNA metabolism and cell division.
One of key issues in the post-genomic era is to assign functions to uncharacterized proteins. Since proteins seldom act alone; rather, they must interact with other biomolecular units to execute their functions. Thus, the functions of unknown proteins may be discovered through studying their interactions with proteins having known functions. Although many approaches have been developed for this purpose, one of main limitations in most of these methods is that the dependence among functional terms has not been taken into account.
We developed a new network-based protein function prediction method which combines the likelihood scores of local classifiers with a relaxation labelling technique. The framework can incorporate the inter-relationship among functional labels into the function prediction procedure and allow us to efficiently discover relevant non-local dependence. We evaluated the performance of the new method with one other representative network-based function prediction method using E. coli protein functional association networks.
Our results showed that the new method has better prediction performance than the previous method. The better predictive power of our method gives new insights about the importance of the dependence between functional terms in protein functional prediction.
The bacterial cell-envelope consists of a complex arrangement of lipids, proteins and carbohydrates that serves as the interface between a microorganism and its environment or, with pathogens, a human host. Escherichia coli has long been investigated as a leading model system to elucidate the fundamental mechanisms underlying microbial cell-envelope biology. This includes extensive descriptions of the molecular identities, biochemical activities and evolutionary trajectories of integral transmembrane proteins, many of which play critical roles in infectious disease and antibiotic resistance. Strikingly, however, only half of the c. 1200 putative cell-envelope-related proteins of E. coli currently have experimentally attributed functions, indicating an opportunity for discovery. In this review, we summarize the state of the art of computational and proteomic approaches for determining the components of the E. coli cell-envelope proteome, as well as exploring the physical and functional interactions that underlie its biogenesis and functionality. We also provide a comprehensive comparative benchmarking analysis on the performance of different bioinformatic and proteomic methods commonly used to determine the subcellular localization of bacterial proteins.
cell-envelope; Escherichia coli; subcellular localization; algorithms; bioinformatics; proteomic methods
Molecular chaperones are known to be involved in many cellular functions, however, a detailed and comprehensive overview of the interactions between chaperones and their cofactors and substrates is still absent. Systematic analysis of physical TAP-tag based protein–protein interactions of all known 63 chaperones in Saccharomyces cerevisiae has been carried out. These chaperones include seven small heat-shock proteins, three members of the AAA+ family, eight members of the CCT/TRiC complex, six members of the prefoldin/GimC complex, 22 Hsp40s, 1 Hsp60, 14 Hsp70s, and 2 Hsp90s. Our analysis provides a clear distinction between chaperones that are functionally promiscuous and chaperones that are functionally specific. We found that a given protein can interact with up to 25 different chaperones during its lifetime in the cell. The number of interacting chaperones was found to increase with the average number of hydrophobic stretches of length between one and five in a given protein. Importantly, cellular hot spots of chaperone interactions are elucidated. Our data suggest the presence of endogenous multicomponent chaperone modules in the cell.
chaperone modules; chaperone networks; protein folding; TAP-tag
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
One goal of modern biology is to chart groups of proteins that act together to perform biological processes via direct and indirect interactions. Such groupings are sometimes called functional modules. The types of protein interactions within modules include physical interactions that generate protein complexes and biochemical associations that make up metabolic pathways. We have combined proteomic and bioinformatic tools, and used them to decipher a large number of protein interactions, complexes, and functional modules with high confidence. In addition, exploring the topology of the resulting interaction networks, we successfully predicted specific biological roles for a number of proteins with previously unknown functions, and identified some potential drug targets. Although our work is focused on E. coli, our phylogenetic projections suggest that a considerable fraction of our observations and predictions can be extrapolated to many other bacterial taxa. As all the data derived from this study are publicly available, others may build on our work for further hypothesis-driven studies of gene function discovery.
A novel resource integrating proteomic and genome context-based tools provides a "systems-wide" functional blueprint ofE. coli, with insights into the biological and evolutionary significance of previously uncharacterized proteins.