1.  Efficient prediction of human protein-protein interactions at a global scale 
BMC Bioinformatics  2014;15(1):383.
Our knowledge of global protein-protein interaction (PPI) networks in complex organisms such as humans is hindered by technical limitations of current methods.
On the basis of short co-occurring polypeptide regions, we developed a tool called MP-PIPE capable of predicting a global human PPI network within 3 months. With a recall of 23% at a precision of 82.1%, we predicted 172,132 putative PPIs. We demonstrate the usefulness of these predictions through a range of experiments.
The speed and accuracy associated with MP-PIPE can make this a potential tool to study individual human PPI networks (from genomic sequences alone) for personalized medicine.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0383-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4272565  PMID: 25492630
Protein-protein interactions; Computational prediction; Human proteome; Massively parallel computing; Personalized medicine; Interactome; Network analysis
2.  The binary protein-protein interaction landscape of Escherichia coli 
Nature biotechnology  2014;32(3):285-290.
Efforts to map the Escherichia coli interactome have identified several hundred macromolecular complexes, but direct binary protein-protein interactions (PPIs) have not been surveyed on a large scale. Here we performed yeast two-hybrid screens of 3,305 baits against 3,606 preys (~70% of the E. coli proteome) in duplicate to generate a map of 2,234 interactions, approximately doubling the number of known binary PPIs in E. coli. Integration of binary PPIs and genetic interactions revealed functional dependencies among components involved in cellular processes, including envelope integrity, flagellum assembly and protein quality control. Many of the binary interactions that could be mapped within multi-protein complexes were informative regarding internal topology and indicated that interactions within complexes are significantly more conserved than those interactions connecting different complexes. This resource will be useful for inferring bacterial gene function and provides a draft reference of the basic physical wiring network of this evolutionarily significant model microbe.
PMCID: PMC4123855  PMID: 24561554
3.  Mapping the functional yeast ABC transporter interactome 
Nature chemical biology  2013;9(9):10.1038/nchembio.1293.
ABC transporters are a ubiquitous class of integral membrane proteins of immense clinical interest because of their strong association with human disease and pharmacology. To improve our understanding of these proteins, we used Membrane Yeast Two-Hybrid (MYTH) technology to map the protein interactome of all non-mitochondrial ABC transporters in the model organism Saccharomy cescerevisiae, and combined this data with previously reported yeast ABC transporter interactions in the BioGRID database to generate a comprehensive, integrated interactome. We show that ABC transporters physically associate with proteins involved in a surprisingly diverse range of functions. We specifically examine the importance of the physical interactions of ABC transporters in both the regulation of one another and in the modulation of proteins involved in zinc homeostasis. The interaction network presented here will be a powerful resource for increasing our fundamental understanding of the cellular role and regulation of ABC transporters.
PMCID: PMC3835492  PMID: 23831759
4.  Quantitative Genome-Wide Genetic Interaction Screens Reveal Global Epistatic Relationships of Protein Complexes in Escherichia coli 
PLoS Genetics  2014;10(2):e1004120.
Large-scale proteomic analyses in Escherichia coli have documented the composition and physical relationships of multiprotein complexes, but not their functional organization into biological pathways and processes. Conversely, genetic interaction (GI) screens can provide insights into the biological role(s) of individual gene and higher order associations. Combining the information from both approaches should elucidate how complexes and pathways intersect functionally at a systems level. However, such integrative analysis has been hindered due to the lack of relevant GI data. Here we present a systematic, unbiased, and quantitative synthetic genetic array screen in E. coli describing the genetic dependencies and functional cross-talk among over 600,000 digenic mutant combinations. Combining this epistasis information with putative functional modules derived from previous proteomic data and genomic context-based methods revealed unexpected associations, including new components required for the biogenesis of iron-sulphur and ribosome integrity, and the interplay between molecular chaperones and proteases. We find that functionally-linked genes co-conserved among γ-proteobacteria are far more likely to have correlated GI profiles than genes with divergent patterns of evolution. Overall, examining bacterial GIs in the context of protein complexes provides avenues for a deeper mechanistic understanding of core microbial systems.
Author Summary
Genome-wide genetic interaction (GI) screens have been performed in yeast, but no analogous large-scale studies have yet been reported for bacteria. Here, we have used E. coli synthetic genetic array (eSGA) technology developed by our group to quantitatively map GIs to reveal epistatic dependencies and functional cross-talk among ∼600,000 digenic mutant combinations. By combining this epistasis information with functional modules derived by our group's earlier efforts from proteomic and genomic context (GC)-based methods, we identify several unexpected pathway-level dependencies, functional links between protein complexes, and biological roles of uncharacterized bacterial gene products. As part of the study, two of our pathway predictions from GI screens were validated experimentally, where we confirmed the role of these new components in iron-sulphur biogenesis and ribosome integrity. We also extrapolated the epistatic connectivity diagram of E. coli to 233 distantly related γ-proteobacterial species lacking GI information, and identified co-conserved genes and functional modules important for bacterial pathogenesis. Overall, this study describes the first genome-scale map of GIs in gram-negative bacterium, and through integrative analysis with previously derived protein-protein and GC-based interaction networks presents a number of novel insights into the architecture of bacterial pathways that could not have been discerned through either network alone.
PMCID: PMC3930520  PMID: 24586182
5.  Phosphatase Complex Pph3/Psy2 Is Involved in Regulation of Efficient Non-Homologous End-Joining Pathway in the Yeast Saccharomyces cerevisiae 
PLoS ONE  2014;9(1):e87248.
One of the main mechanisms for double stranded DNA break (DSB) repair is through the non-homologous end-joining (NHEJ) pathway. Using plasmid and chromosomal repair assays, we showed that deletion mutant strains for interacting proteins Pph3p and Psy2p had reduced efficiencies in NHEJ. We further observed that this activity of Pph3p and Psy2p appeared linked to cell cycle Rad53p and Chk1p checkpoint proteins. Pph3/Psy2 is a phosphatase complex, which regulates recovery from the Rad53p DNA damage checkpoint. Overexpression of Chk1p checkpoint protein in a parallel pathway to Rad53p compensated for the deletion of PPH3 or PSY2 in a chromosomal repair assay. Double mutant strains Δpph3/Δchk1 and Δpsy2/Δchk1 showed additional reductions in the efficiency of plasmid repair, compared to both single deletions which is in agreement with the activity of Pph3p and Psy2p in a parallel pathway to Chk1p. Genetic interaction analyses also supported a role for Pph3p and Psy2p in DNA damage repair, the NHEJ pathway, as well as cell cycle progression. Collectively, we report that the activity of Pph3p and Psy2p further connects NHEJ repair to cell cycle progression.
PMCID: PMC3909046  PMID: 24498054
6.  The MoxR ATPase RavA and Its Cofactor ViaA Interact with the NADH:Ubiquinone Oxidoreductase I in Escherichia coli 
PLoS ONE  2014;9(1):e85529.
MoxR ATPases are widespread throughout bacteria and archaea. The experimental evidence to date suggests that these proteins have chaperone-like roles in facilitating the maturation of dedicated protein complexes that are functionally diverse. In Escherichia coli, the MoxR ATPase RavA and its putative cofactor ViaA are found to exist in early stationary-phase cells at 37°C at low levels of about 350 and 90 molecules per cell, respectively. Both proteins are predominantly localized to the cytoplasm, but ViaA was also unexpectedly found to localize to the cell membrane. Whole genome microarrays and synthetic lethality studies both indicated that RavA-ViaA are genetically linked to Fe-S cluster assembly and specific respiratory pathways. Systematic analysis of mutant strains of ravA and viaA indicated that RavA-ViaA sensitizes cells to sublethal concentrations of aminoglycosides. Furthermore, this effect was dependent on RavA's ATPase activity, and on the presence of specific subunits of NADH:ubiquinone oxidoreductase I (Nuo Complex, or Complex I). Importantly, both RavA and ViaA were found to physically interact with specific Nuo subunits. We propose that RavA-ViaA facilitate the maturation of the Nuo complex.
PMCID: PMC3893208  PMID: 24454883
7.  ER exit sites are physical and functional core autophagosome biogenesis components 
Molecular Biology of the Cell  2013;24(18):2918-2931.
ERES function is required for assembly of the autophagy machinery immediately downstream of the Atg1 kinase complex and is associated with formation of autophagosomes at every stage of the process. ERES are core components of the autophagy machinery for the biogenesis of autophagosomes.
Autophagy is a central homeostasis and stress response pathway conserved in all eukaryotes. One hallmark of autophagy is the de novo formation of autophagosomes. These double-membrane vesicular structures form around and deliver cargo for degradation by the vacuole/lysosome. Where and how autophagosomes form are outstanding questions. Here we show, using proteomic, cytological, and functional analyses, that autophagosomes are spatially, physically, and functionally linked to endoplasmic reticulum exit sites (ERES), which are specialized regions of the endoplasmic reticulum where COPII transport vesicles are generated. Our data demonstrate that ERES are core autophagosomal biogenesis components whose function is required for the hierarchical assembly of the autophagy machinery immediately downstream of the Atg1 kinase complex at phagophore assembly sites.
PMCID: PMC3771953  PMID: 23904270
8.  A negative genetic interaction map in isogenic cancer cell lines reveals cancer cell vulnerabilities 
This study defines a network of synthetic sick/lethal interactions with a set of query genes in a series of isogenic cancer cell lines. Analysis of differential essentiality reveals general properties in genetic interaction networks derived from studies on model organisms.
This study defined about 200 negative genetic interactions in the isogenic cancer cell line background.Mapping of negative genetic interactions in a systematic fashion in isogenic cancer cell lines has revealed novel functions for several uncharacterized genes.This study demonstrates that differential essentiality profiles derived from isogenic cancer cell lines can be used to classify genetic dependencies in non-isogenic cancer cell lines.
Improved efforts are necessary to define the functional product of cancer mutations currently being revealed through large-scale sequencing efforts. Using genome-scale pooled shRNA screening technology, we mapped negative genetic interactions across a set of isogenic cancer cell lines and confirmed hundreds of these interactions in orthogonal co-culture competition assays to generate a high-confidence genetic interaction network of differentially essential or differential essentiality (DiE) genes. The network uncovered examples of conserved genetic interactions, densely connected functional modules derived from comparative genomics with model systems data, functions for uncharacterized genes in the human genome and targetable vulnerabilities. Finally, we demonstrate a general applicability of DiE gene signatures in determining genetic dependencies of other non-isogenic cancer cell lines. For example, the PTEN−/− DiE genes reveal a signature that can preferentially classify PTEN-dependent genotypes across a series of non-isogenic cell lines derived from the breast, pancreas and ovarian cancers. Our reference network suggests that many cancer vulnerabilities remain to be discovered through systematic derivation of a network of differentially essential genes in an isogenic cancer cell model.
PMCID: PMC3817404  PMID: 24104479
genetic interaction; genome stability; mitotic stress; pooled shRNA screening
9.  A Census of Human Soluble Protein Complexes 
Cell  2012;150(5):1068-1081.
Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions which were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably-associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes, and encompass both candidate disease genes and unnanotated proteins to inform on mechanism. Strikingly, whereas larger multi-protein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with 5 or fewer subunits are far more likely to be functionally un-annotated or restricted to vertebrates, suggesting more recent functional innovations.
PMCID: PMC3477804  PMID: 22939629
10.  A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair 
Molecular microbiology  2010;79(2):484-502.
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and the associated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes. Cas1 is a CRISPR-associated protein that is common to all CRISPR-containing prokaryotes but its function remains obscure. Here we show that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions, replication forks, and 5′-flaps. The crystal structure of YgbT and site-directed mutagenesis have revealed the potential active site. Genome-wide screens show that YgbT physically and genetically interacts with key components of DNA repair systems, including recB, recC and ruvB. Consistent with these findings, the ygbT deletion strain showed increased sensitivity to DNA damage and impaired chromosomal segregation. Similar phenotypes were observed in strains with deletion of CRISPR clusters, suggesting that the function of YgbT in repair involves interaction with the CRISPRs. These results show that YgbT belongs to a novel, structurally distinct family of nucleases acting on branched DNAs and suggest that, in addition to antiviral immunity, at least some components of the CRISPR-Cas system have a function in DNA repair.
PMCID: PMC3071548  PMID: 21219465
Cas1; CRISPR; DNA recombination; DNA repair; nuclease; YgbT
11.  Genetic Interaction Maps in Escherichia coli Reveal Functional Crosstalk among Cell Envelope Biogenesis Pathways 
PLoS Genetics  2011;7(11):e1002377.
As the interface between a microbe and its environment, the bacterial cell envelope has broad biological and clinical significance. While numerous biosynthesis genes and pathways have been identified and studied in isolation, how these intersect functionally to ensure envelope integrity during adaptive responses to environmental challenge remains unclear. To this end, we performed high-density synthetic genetic screens to generate quantitative functional association maps encompassing virtually the entire cell envelope biosynthetic machinery of Escherichia coli under both auxotrophic (rich medium) and prototrophic (minimal medium) culture conditions. The differential patterns of genetic interactions detected among >235,000 digenic mutant combinations tested reveal unexpected condition-specific functional crosstalk and genetic backup mechanisms that ensure stress-resistant envelope assembly and maintenance. These networks also provide insights into the global systems connectivity and dynamic functional reorganization of a universal bacterial structure that is both broadly conserved among eubacteria (including pathogens) and an important target.
Author Summary
Proper assembly of the cell envelope is essential for bacterial growth, environmental adaptation, and drug resistance. Yet, while the biological roles of the many genes and pathways involved in biosynthesis of the cell envelope have been studied extensively in isolation, how the myriad components intersect functionally to maintain envelope integrity under different growth conditions has not been explored systematically. Genome-scale genetic interaction screens have increasingly been performed to great impact in yeast; no analogous comprehensive studies have yet been reported for bacteria despite their prominence in human health and disease. We addressed this by using a synthetic genetic array technology to generate quantitative maps of genetic interactions encompassing virtually all the components of the cell envelope biosynthetic machinery of the classic model bacterium E. coli in two common laboratory growth conditions (rich and minimal medium). From the resulting networks of high-confidence genetic interactions, we identify condition-specific functional dependencies underlying envelope assembly and global remodeling of genetic backup mechanisms that ensure envelope integrity under environmental challenge.
PMCID: PMC3219608  PMID: 22125496
12.  Ribosome-Dependent ATPase Interacts with Conserved Membrane Protein in Escherichia coli to Modulate Protein Synthesis and Oxidative Phosphorylation 
PLoS ONE  2011;6(4):e18510.
Elongation factor RbbA is required for ATP-dependent deacyl-tRNA release presumably after each peptide bond formation; however, there is no information about the cellular role. Proteomic analysis in Escherichia coli revealed that RbbA reciprocally co-purified with a conserved inner membrane protein of unknown function, YhjD. Both proteins are also physically associated with the 30S ribosome and with members of the lipopolysaccharide transport machinery. Genome-wide genetic screens of rbbA and yhjD deletion mutants revealed aggravating genetic interactions with mutants deficient in the electron transport chain. Cells lacking both rbbA and yhjD exhibited reduced cell division, respiration and global protein synthesis as well as increased sensitivity to antibiotics targeting the ETC and the accuracy of protein synthesis. Our results suggest that RbbA appears to function together with YhjD as part of a regulatory network that impacts bacterial oxidative phosphorylation and translation efficiency.
PMCID: PMC3083400  PMID: 21556145
13.  Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells** 
Science (New York, N.Y.)  2010;329(5991):533-538.
Protein and mRNA copy numbers vary from cell to cell in isogenic bacterial populations. However, these molecules often exist in low copy numbers, and are difficult to detect in single cells. Here we carried out quantitative system-wide analyses of protein and mRNA expression in individual cells with single-molecule sensitivity using a newly constructed yellow fluorescent protein fusion library for Escherichia coli. We found that almost all protein number distributions can be described by the gamma distribution with two fitting parameters which, at low expression levels, have clear physical interpretations as the transcription rate and protein burst size. At high expression levels, the distributions are dominated by extrinsic noise. Strikingly, we found that a single cell's protein and mRNA copy numbers for any given gene are uncorrelated.
PMCID: PMC2922915  PMID: 20671182
14.  Computational and experimental approaches to chart the Escherichia coli cell-envelope-associated proteome and interactome 
Fems Microbiology Reviews  2008;33(1):66-97.
The bacterial cell-envelope consists of a complex arrangement of lipids, proteins and carbohydrates that serves as the interface between a microorganism and its environment or, with pathogens, a human host. Escherichia coli has long been investigated as a leading model system to elucidate the fundamental mechanisms underlying microbial cell-envelope biology. This includes extensive descriptions of the molecular identities, biochemical activities and evolutionary trajectories of integral transmembrane proteins, many of which play critical roles in infectious disease and antibiotic resistance. Strikingly, however, only half of the c. 1200 putative cell-envelope-related proteins of E. coli currently have experimentally attributed functions, indicating an opportunity for discovery. In this review, we summarize the state of the art of computational and proteomic approaches for determining the components of the E. coli cell-envelope proteome, as well as exploring the physical and functional interactions that underlie its biogenesis and functionality. We also provide a comprehensive comparative benchmarking analysis on the performance of different bioinformatic and proteomic methods commonly used to determine the subcellular localization of bacterial proteins.
PMCID: PMC2704936  PMID: 19054114
cell-envelope; Escherichia coli; subcellular localization; algorithms; bioinformatics; proteomic methods
15.  Global Functional Atlas of Escherichia coli Encompassing Previously Uncharacterized Proteins 
PLoS Biology  2009;7(4):e1000096.
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
Author Summary
One goal of modern biology is to chart groups of proteins that act together to perform biological processes via direct and indirect interactions. Such groupings are sometimes called functional modules. The types of protein interactions within modules include physical interactions that generate protein complexes and biochemical associations that make up metabolic pathways. We have combined proteomic and bioinformatic tools, and used them to decipher a large number of protein interactions, complexes, and functional modules with high confidence. In addition, exploring the topology of the resulting interaction networks, we successfully predicted specific biological roles for a number of proteins with previously unknown functions, and identified some potential drug targets. Although our work is focused on E. coli, our phylogenetic projections suggest that a considerable fraction of our observations and predictions can be extrapolated to many other bacterial taxa. As all the data derived from this study are publicly available, others may build on our work for further hypothesis-driven studies of gene function discovery.
A novel resource integrating proteomic and genome context-based tools provides a "systems-wide" functional blueprint ofE. coli, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
PMCID: PMC2672614  PMID: 19402753
16.  Recombination analysis of Soybean mosaic virus sequences reveals evidence of RNA recombination between distinct pathotypes 
Virology Journal  2008;5:143.
RNA recombination is one of the two major factors that create RNA genome variability. Assessing its incidence in plant RNA viruses helps understand the formation of new isolates and evaluate the effectiveness of crop protection strategies. To search for recombination in Soybean mosaic virus (SMV), the causal agent of a worldwide seed-borne, aphid-transmitted viral soybean disease, we obtained all full-length genome sequences of SMV as well as partial sequences encoding the N-terminal most (P1 protease) and the C-terminal most (capsid protein; CP) viral protein. The sequences were analyzed for possible recombination events using a variety of automatic and manual recombination detection and verification approaches. Automatic scanning identified 3, 10, and 17 recombination sites in the P1, CP, and full-length sequences, respectively. Manual analyses confirmed 10 recombination sites in three full-length SMV sequences. To our knowledge, this is the first report of recombination between distinct SMV pathotypes. These data imply that different SMV pathotypes can simultaneously infect a host cell and exchange genetic materials through recombination. The high incidence of SMV recombination suggests that recombination plays an important role in SMV evolution. Obtaining additional full-length sequences will help elucidate this role.
PMCID: PMC2627826  PMID: 19036160
17.  Altered gene expression changes in Arabidopsis leaf tissues and protoplasts in response to Plum pox virus infection 
BMC Genomics  2008;9:325.
Virus infection induces the activation and suppression of global gene expression in the host. Profiling gene expression changes in the host may provide insights into the molecular mechanisms that underlie host physiological and phenotypic responses to virus infection. In this study, the Arabidopsis Affymetrix ATH1 array was used to assess global gene expression changes in Arabidopsis thaliana plants infected with Plum pox virus (PPV). To identify early genes in response to PPV infection, an Arabidopsis synchronized single-cell transformation system was developed. Arabidopsis protoplasts were transfected with a PPV infectious clone and global gene expression changes in the transfected protoplasts were profiled.
Microarray analysis of PPV-infected Arabidopsis leaf tissues identified 2013 and 1457 genes that were significantly (Q ≤ 0.05) up- (≥ 2.5 fold) and downregulated (≤ -2.5 fold), respectively. Genes associated with soluble sugar, starch and amino acid, intracellular membrane/membrane-bound organelles, chloroplast, and protein fate were upregulated, while genes related to development/storage proteins, protein synthesis and translation, and cell wall-associated components were downregulated. These gene expression changes were associated with PPV infection and symptom development. Further transcriptional profiling of protoplasts transfected with a PPV infectious clone revealed the upregulation of defence and cellular signalling genes as early as 6 hours post transfection. A cross sequence comparison analysis of genes differentially regulated by PPV-infected Arabidopsis leaves against uniEST sequences derived from PPV-infected leaves of Prunus persica, a natural host of PPV, identified orthologs related to defence, metabolism and protein synthesis. The cross comparison of genes differentially regulated by PPV infection and by the infections of other positive sense RNA viruses revealed a common set of 416 genes. These identified genes, particularly the early responsive genes, may be critical in virus infection.
Gene expression changes in PPV-infected Arabidopsis are the molecular basis of stress and defence-like responses, PPV pathogenesis and symptom development. The differentially regulated genes, particularly the early responsive genes, and a common set of genes regulated by infections of PPV and other positive sense RNA viruses identified in this study are candidates suitable for further functional characterization to shed lights on molecular virus-host interactions.
PMCID: PMC2478689  PMID: 18613973

