Genomic information on tumors from 50 cancer types catalogued by The International Cancer Genome Consortium (ICGC) shows that only few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been large interest in developing pathway and network analysis methods that group genes and illuminate the processes involved. We provide an overview of these analysis techniques and show where they guide mechanistic and translational investigations.
The cancer secretome includes all of the macromolecules secreted by cells into their microenvironment. Cancer cell secretomes are significantly different to that of normal cells reflecting the changes that normal cells have undergone during their transition to malignancy. More importantly, cancer secretomes are known to be active mediators of both local and distant host cells and play an important role in the progression and dissemination of cancer. Here we have quantitatively profiled both the composition of breast cancer secretomes associated with osteotropism, and their modulation under normoxic and hypoxic conditions. We detect and quantify 162 secretome proteins across all conditions which show differential hypoxic induction and association with osteotropism. Mass Spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the dataset identifier PXD000397 and the complete proteomic, bioinformatic and biological analyses are reported in Cox et al. (2015) .
Proteomics; Breast cancer; Bone metastasis; Secretome; Hypoxia; Pre-metastatic Niche; Lysyl Oxidase
Cancer cells acquire pathological phenotypes through accumulation of mutations that perturb signaling networks. However, global analysis of these events is currently limited. Here, we identify six types of network-attacking mutations (NAMs), including changes in kinase and SH2 modulation, network rewiring, and the genesis and extinction of phosphorylation sites. We developed a computational platform (ReKINect) to identify NAMs and systematically interpreted the exomes and quantitative (phospho-)proteomes of five ovarian cancer cell lines and the global cancer genome repository. We identified and experimentally validated several NAMs, including PKCγ M501I and PKD1 D665N, which encode specificity switches analogous to the appearance of kinases de novo within the kinome. We discover mutant molecular logic gates, a drift toward phospho-threonine signaling, weakening of phosphorylation motifs, and kinase-inactivating hotspots in cancer. Our method pinpoints functional NAMs, scales with the complexity of cancer genomes and cell signaling, and may enhance our capability to therapeutically target tumor-specific networks.
•Mutations perturbing signaling networks are systematically classified and interpreted•Several such functional mutations are identified in cancer and experimentally validated•The results suggest that a single point mutant can have profound signaling effects•Systematic interpretation of genomic data may assist future precision-medicine efforts
A systematic classification of genomic variants in cancer reveals the many ways in which signaling networks can be perturbed, including rewiring and the creation or destruction of phosphorylation sites.
Protein kinases control cellular responses to environmental cues by swift and accurate signal processing. Breakdowns in this high-fidelity capability are a driving force in cancer and other diseases. Thus, our limited understanding of which amino acids in the kinase domain encode substrate specificity, the so-called determinants of specificity (DoS), constitutes a major obstacle in cancer signaling. Here, we systematically discover several DoS and experimentally validate three of them, named the αC1, αC3, and APE-7 residues. We demonstrate that DoS form sparse networks of non-conserved residues spanning distant regions. Our results reveal a likely role for inter-residue allostery in specificity and an evolutionary decoupling of kinase activity and specificity, which appear loaded on independent groups of residues. Finally, we uncover similar properties driving SH2 domain specificity and demonstrate how the identification of DoS can be utilized to elucidate a greater understanding of the role of signaling networks in cancer (Creixell et al., 2015 [this issue of Cell]).
•Residues driving specificity in the kinase and SH2 domains are globally identified•Three new such residues, termed αC1, αC3, and APE-7, are experimentally validated•Specificity and catalytic activity appear to be encoded in distinct sets of residues•The global identification of determinants allows the modeling of rewiring mutations
Determining the residues that drive the specificity of kinases and of SH2 domains that bind phosphorylation sites paves the way for a systematic interpretation of mutations on signaling networks.
The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor, but only a minority drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.
As François Jacob pointed out over 30 years ago, evolution is a tinkering process, and, as such, relies on the genetic diversity produced by mutation subsequently shaped by Darwinian selection. However, there is one implicit assumption that is made when studying this tinkering process; it is typically assumed that all amino acid residues are equally likely to mutate or to result from a mutation. Here, by reconstructing ancestral sequences and computing mutational probabilities for all the amino acid residues, we refute this assumption and show extensive inequalities between different residues in terms of their mutational activity. Moreover, we highlight the importance of the genetic code and physico-chemical properties of the amino acid residues as likely causes of these inequalities and uncover serine as a mutational hot spot. Finally, we explore the consequences that these different mutational properties have on phosphorylation site evolution, showing that a higher degree of evolvability exists for phosphorylated threonine and, to a lesser extent, serine in comparison with tyrosine residues. As exemplified by the suppression of serine's mutational activity in phosphorylation sites, our results suggest that the cell can fine-tune the mutational activities of amino acid residues when they reside in functional protein regions.
amino acid evolvability; mutation; phosphorylation site evolution
Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the U.S. National Cancer Institute (NCI) convened the “International Workshop on Proteomic Data Quality Metrics” in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed up on two primary needs for the wide use of quality metrics: (1) an evolving list of comprehensive quality metrics and (2) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data.
By agreement, this article is published simultaneously in the Journal of Proteome Research, Molecular and Cellular Proteomics, Proteomics, and Proteomics Clinical Applications as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.
selected reaction monitoring; bioinformatics; data quality; metrics; open access; Amsterdam Principles; standards
Following genotoxic stress, cells activate a complex kinase-based signaling network to arrest the cell cycle and initiate DNA repair. p53-defective tumor cells rewire their checkpoint response and become dependent on the p38/MK2 pathway for survival after DNA damage, despite a functional ATR-Chk1 pathway. We used functional genetics to dissect the contributions of Chk1 and MK2 to checkpoint control. We show that nuclear Chk1 activity is essential to establish a G2/M checkpoint, while cytoplasmic MK2 activity is critical for prolonged checkpoint maintenance through a process of post-transcriptional mRNA stabilization. Following DNA damage, the p38/MK2 complex relocalizes from nucleus to cytoplasm where MK2, phosphorylates hnRNPA0, to stabilize Gadd45α mRNA, while p38 phosphorylates and releases the translational inhibitor TIAR. In addition, MK2 phosphorylates PARN, blocking Gadd45α mRNA degradation. Gadd45α functions within a positive feedback loop, sustaining the MK2-dependent cytoplasmic sequestration of Cdc25B/C to block mitotic entry in the presence of unrepaired DNA damage. Our findings demonstrate a critical role for the MK2 pathway in the post-transcriptional regulation of gene expression as part of the DNA damage response in cancer cells.
John Nash showed that within a complex system individuals are best off if they make the best decision that they can, taking into account the decisions of the other individuals. Here, we investigate if similar principles influence the evolution of signaling networks in multicellular animals. Specifically, by analyzing a set of metazoan species, we observe a striking negative correlation of genomically encoded tyrosine content with biological complexity (as measured by the number of cell types in each organism). We discuss how this observed tyrosine loss correlates with the expansion of tyrosine kinases in the evolution of the metazoan lineage and how it may relate to the optimization of signaling systems in multi-cellular animals. We propose that this phenomenon illustrates genome-wide adaptive evolution to accommodate beneficial genetic perturbation.
A combined computational and biochemical approach reveals how mitotic kinases allow cell division to proceed in the presence of DNA damage.
DNA damage checkpoints arrest cell cycle progression to facilitate DNA repair. The ability to survive genotoxic insults depends not only on the initiation of cell cycle checkpoints but also on checkpoint maintenance. While activation of DNA damage checkpoints has been studied extensively, molecular mechanisms involved in sustaining and ultimately inactivating cell cycle checkpoints are largely unknown. Here, we explored feedback mechanisms that control the maintenance and termination of checkpoint function by computationally identifying an evolutionary conserved mitotic phosphorylation network within the DNA damage response. We demonstrate that the non-enzymatic checkpoint adaptor protein 53BP1 is an in vivo target of the cell cycle kinases Cyclin-dependent kinase-1 and Polo-like kinase-1 (Plk1). We show that Plk1 binds 53BP1 during mitosis and that this interaction is required for proper inactivation of the DNA damage checkpoint. 53BP1 mutants that are unable to bind Plk1 fail to restart the cell cycle after ionizing radiation-mediated cell cycle arrest. Importantly, we show that Plk1 also phosphorylates the 53BP1-binding checkpoint kinase Chk2 to inactivate its FHA domain and inhibit its kinase activity in mammalian cells. Thus, a mitotic kinase-mediated negative feedback loop regulates the ATM-Chk2 branch of the DNA damage signaling network by phosphorylating conserved sites in 53BP1 and Chk2 to inactivate checkpoint signaling and control checkpoint duration.
DNA is constantly damaged both by factors outside our bodies (such as ultraviolet rays from sunlight) and by factors from within (such as reactive oxygen species produced during metabolism). DNA damage can lead to malfunctioning of genes, and persistent DNA damage can result in developmental disorders or the development of cancer. To ensure proper DNA repair, cells are equipped with an evolutionarily conserved DNA damage checkpoint, which stops proliferation and activates DNA repair mechanisms. Intriguingly, this DNA damage checkpoint responds to DNA damage throughout the cell cycle, except during mitosis. In this work, we have addressed how cells dismantle their DNA damage checkpoint during mitosis to allow cell division to proceed even if there is damaged DNA present. Using the observation that kinases phosphorylate their substrates on evolutionarily conserved, kinase-specific sequence motifs, we have used a combined computational and experimental approach to predict and verify key proteins involved in mitotic checkpoint inactivation. We show that the checkpoint scaffold protein 53BP1 is phosphorylated by the mitotic kinases Cdk1 and Polo-like kinase-1 (Plk1). Furthermore, we find that Plk1 can inactivate the checkpoint kinase Chk2, which is downstream of 53BP1. Plk1 is shown to be a key mediator of mitotic checkpoint inactivation, as cells that cannot activate Plk1 fail to properly dismantle the DNA damage checkpoint during mitosis and instead show DNA damage-induced Chk2 kinase activation. Two related papers, published in PLoS Biology (Vidanes et al., doi:10.1371/journal.pbio.1000286) and PLoS Genetics (Donnianni et al., doi:10.1371/journal.pgen.1000763), similarly investigate the phenomenon of DNA damage checkpoint silencing.
Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.
A key role of signal transduction pathways is to control transcriptional programs in the nucleus as a function of signals received by the cell via complex post-translational modification cascades. This determines cell-context specific responses to environmental stimuli. Given the difficulty of quantitating protein concentration and post-translational modifications, signaling pathway studies are still for the most part conducted one interaction at the time. Thus, genome-wide, cell-context specific dissection of signaling pathways is still an open challenge in molecular systems biology.
In this manuscript we extend the MINDy algorithm for the identification of post-translational modulators of transcription factor activity, to produce a first genome-wide map of the interface between signaling and transcriptional regulatory programs in human B cells. We show that the serine-threonine kinase STK38 emerges as the most pleiotropic signaling protein in this cellular context and we biochemically validate this finding by shRNA-mediated silencing of this kinase, followed by gene expression profile analysis. We also extensively validate the inferred interactions using protein-protein interaction databases and the kinase-substrate interaction prediction algorithm NetworKIN.
Protein kinases control cellular decision processes by phosphorylating specific substrates. Proteome-wide mapping has identified thousands of in vivo phosphorylation sites. However, systematically resolving which kinase targets each site is presently infeasible, due to the limited specificity of consensus motifs and the potential influence of contextual factors, such as protein scaffolds, localisation and expression, on cellular substrate specificity. We have therefore developed a computational method, NetworKIN, that augments motifs with context for kinases and phosphoproteins. This can pinpoint individual kinases responsible for specific in vivo phosphorylation events and yields a 2.5-fold improvement in the accuracy with which phosphorylation networks can be constructed. We show that context provides 60–80% of the computational capability to assign in vivo substrate specificity. Applying this approach to a DNA damage signalling network, we extend its cell-cycle regulation by showing that 53BP1 is a CDK1 substrate, show that Rad50 is phosphorylated by ATM kinase under genotoxic stress, and suggest novel roles of ATM in apoptosis. Finally, we present a scalable strategy to validate our predictions and use it to support the prediction that BCLAF1 is a GSK3 substrate.
Cellular signaling networks have evolved to enable swift and accurate responses, even in the face of genetic or environmental perturbation. Thus, genetic screens may not identify all the genes that regulate different biological processes. Moreover, although classical screening approaches have succeeded in providing parts lists of the essential components of signaling networks, they typically do not provide much insight into the hierarchical and functional relations that exist among these components. We describe a high-throughput screen in which we used RNA interference to systematically inhibit two genes simultaneously in 17,724 combinations to identify regulators of Drosophila JUN NH2-terminal kinase (JNK). Using both genetic and phosphoproteomics data, we then implemented an integrative network algorithm to construct a JNK phosphorylation network, which provides structural and mechanistic insights into the systems architecture of JNK signaling.
Protein kinases control cellular responses by phosphorylating specific substrates. Recent proteome-wide mapping of protein phosphorylation sites by mass spectrometry has discovered thousands of in vivo sites. Systematically assigning all 518 human kinases to all these sites is a challenging problem. The NetworKIN database (http://networkin.info) integrates consensus substrate motifs with context modelling for improved prediction of cellular kinase–substrate relations. Based on the latest human phosphoproteome from the Phospho.ELM and PhosphoSite databases, the resource offers insight into phosphorylation-modulated interaction networks. Here, we describe how NetworKIN can be used for both global and targeted molecular studies. Via the web interface users can query the database of precomputed kinase–substrate relations or obtain predictions on novel phosphoproteins. The database currently contains a predicted phosphorylation network with 20 224 site-specific interactions involving 3978 phosphoproteins and 73 human kinases from 20 families.
WW domains are protein modules that mediate protein-protein interactions through recognition of proline-rich peptide motifs and phosphorylated serine/threonine-proline sites. To pursue the functional properties of WW domains, we employed mass spectrometry to identify 148 proteins that associate with 10 human WW domains. Many of these proteins represent novel WW domain-binding partners and are components of multiprotein complexes involved in molecular processes, such as transcription, RNA processing, and cytoskeletal regulation. We validated one complex in detail, showing that WW domains of the AIP4 E3 protein-ubiquitin ligase bind directly to a PPXY motif in the p68 subunit of pre-mRNA cleavage and polyadenylation factor Im in a manner that promotes p68 ubiquitylation. The tested WW domains fall into three broad groups on the basis of hierarchical clustering with respect to their associated proteins; each such cluster of bound proteins displayed a distinct set of WW domain-binding motifs. We also found that separate WW domains from the same protein or closely related proteins can have different specificities for protein ligands and also demonstrated that a single polypeptide can bind multiple classes of WW domains through separate proline-rich motifs. These data suggest that WW domains provide a versatile platform to link individual proteins into physiologically important networks.
Post-translational phosphorylation is one of the most common protein modifications. Phosphoserine, threonine and tyrosine residues play critical roles in the regulation of many cellular processes. The fast growing number of research reports on protein phosphorylation points to a general need for an accurate database dedicated to phosphorylation to provide easily retrievable information on phosphoproteins.
Phospho.ELM is a new resource containing experimentally verified phosphorylation sites manually curated from the literature and is developed as part of the ELM (Eukaryotic Linear Motif) resource. Phospho.ELM constitutes the largest searchable collection of phosphorylation sites available to the research community. The Phospho.ELM entries store information about substrate proteins with the exact positions of residues known to be phosphorylated by cellular kinases. Additional annotation includes literature references, subcellular compartment, tissue distribution, and information about the signaling pathways involved as well as links to the molecular interaction database MINT. Phospho.ELM version 2.0 contains 1703 phosphorylation site instances for 556 phosphorylated proteins.
Phospho.ELM will be a valuable tool both for molecular biologists working on protein phosphorylation sites and for bioinformaticians developing computational predictions on the specificity of phosphorylation reactions.
post-transcriptional modification; protein kinase; bioinformatics
SRS (Sequence Retrieval System) is a widely used keyword search engine for querying biological databases. BLAST2 is the most widely used tool to query databases by sequence similarity search. These tools allow users to retrieve sequences by shared keyword or by shared similarity, with many public web servers available. However, with the increasingly large datasets available it is now quite common that a user is interested in some subset of homologous sequences but has no efficient way to restrict retrieval to that set. By allowing the user to control SRS from the BLAST output, BLAST2SRS (http://blast2srs.embl.de/) aims to meet this need. This server therefore combines the two ways to search sequence databases: similarity and keyword.
Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensive collection of small functional sites/motifs comparable to the globular domain resources, yet these are as important for the function of multidomain proteins. Short linear peptide motifs are used for cell compartment targeting, protein–protein interaction, regulation by phosphorylation, acetylation, glycosylation and a host of other post-translational modifications. ELM, the Eukaryotic Linear Motif server at http://elm.eu.org/, is a new bioinformatics resource for investigating candidate short non-globular functional motifs in eukaryotic proteins, aiming to fill the void in bioinformatics tools. Sequence comparisons with short motifs are difficult to evaluate because the usual significance assessments are inappropriate. Therefore the server is implemented with several logical filters to eliminate false positives. Current filters are for cell compartment, globular domain clash and taxonomic range. In favourable cases, the filters can reduce the number of retained matches by an order of magnitude or more.
A major challenge in the proteomics and structural genomics era is to predict protein structure and function, including identification of those proteins that are partially or wholly unstructured. Non-globular sequence segments often contain short linear peptide motifs (e.g. SH3-binding sites) which are important for protein function. We present here a new tool for discovery of such unstructured, or disordered regions within proteins. GlobPlot (http://globplot.embl.de) is a web service that allows the user to plot the tendency within the query protein for order/globularity and disorder. We show examples with known proteins where it successfully identifies inter-domain segments containing linear motifs, and also apparently ordered regions that do not contain any recognised domain. GlobPlot may be useful in domain hunting efforts. The plots indicate that instances of known domains may often contain additional N- or C-terminal segments that appear ordered. Thus GlobPlot may be of use in the design of constructs corresponding to globular proteins, as needed for many biochemical studies, particularly structural biology. GlobPlot has a pipeline interface—GlobPipe—for the advanced user to do whole proteome analysis. GlobPlot can also be used as a generic infrastructure package for graphical displaying of any possible propensity.
Many aspects of cell signalling, trafficking, and targeting are governed by interactions between globular protein domains and short peptide segments. These domains often bind multiple peptides that share a common sequence pattern, or “linear motif” (e.g., SH3 binding to PxxP). Many domains are known, though comparatively few linear motifs have been discovered. Their short length (three to eight residues), and the fact that they often reside in disordered regions in proteins makes them difficult to detect through sequence comparison or experiment. Nevertheless, each new motif provides critical molecular details of how interaction networks are constructed, and can explain how one protein is able to bind to very different partners. Here we show that binding motifs can be detected using data from genome-scale interaction studies, and thus avoid the normally slow discovery process. Our approach based on motif over-representation in non-homologous sequences, rediscovers known motifs and predicts dozens of others. Direct binding experiments reveal that two predicted motifs are indeed protein-binding modules: a DxxDxxxD protein phosphatase 1 binding motif with a KD of 22 μM and a VxxxRxYS motif that binds Translin with a KD of 43 μM. We estimate that there are dozens or even hundreds of linear motifs yet to be discovered that will give molecular insight into protein networks and greatly illuminate cellular processes.
Many protein interactions are mediated by short amino acid motifs. The authors describe a new approach to identify these interaction motifs and experimentally validate some of their binding predictions.