The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor, but only a minority drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.
As François Jacob pointed out over 30 years ago, evolution is a tinkering process, and, as such, relies on the genetic diversity produced by mutation subsequently shaped by Darwinian selection. However, there is one implicit assumption that is made when studying this tinkering process; it is typically assumed that all amino acid residues are equally likely to mutate or to result from a mutation. Here, by reconstructing ancestral sequences and computing mutational probabilities for all the amino acid residues, we refute this assumption and show extensive inequalities between different residues in terms of their mutational activity. Moreover, we highlight the importance of the genetic code and physico-chemical properties of the amino acid residues as likely causes of these inequalities and uncover serine as a mutational hot spot. Finally, we explore the consequences that these different mutational properties have on phosphorylation site evolution, showing that a higher degree of evolvability exists for phosphorylated threonine and, to a lesser extent, serine in comparison with tyrosine residues. As exemplified by the suppression of serine's mutational activity in phosphorylation sites, our results suggest that the cell can fine-tune the mutational activities of amino acid residues when they reside in functional protein regions.
amino acid evolvability; mutation; phosphorylation site evolution
Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the U.S. National Cancer Institute (NCI) convened the “International Workshop on Proteomic Data Quality Metrics” in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed up on two primary needs for the wide use of quality metrics: (1) an evolving list of comprehensive quality metrics and (2) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data.
By agreement, this article is published simultaneously in the Journal of Proteome Research, Molecular and Cellular Proteomics, Proteomics, and Proteomics Clinical Applications as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.
selected reaction monitoring; bioinformatics; data quality; metrics; open access; Amsterdam Principles; standards
Following genotoxic stress, cells activate a complex kinase-based signaling network to arrest the cell cycle and initiate DNA repair. p53-defective tumor cells rewire their checkpoint response and become dependent on the p38/MK2 pathway for survival after DNA damage, despite a functional ATR-Chk1 pathway. We used functional genetics to dissect the contributions of Chk1 and MK2 to checkpoint control. We show that nuclear Chk1 activity is essential to establish a G2/M checkpoint, while cytoplasmic MK2 activity is critical for prolonged checkpoint maintenance through a process of post-transcriptional mRNA stabilization. Following DNA damage, the p38/MK2 complex relocalizes from nucleus to cytoplasm where MK2, phosphorylates hnRNPA0, to stabilize Gadd45α mRNA, while p38 phosphorylates and releases the translational inhibitor TIAR. In addition, MK2 phosphorylates PARN, blocking Gadd45α mRNA degradation. Gadd45α functions within a positive feedback loop, sustaining the MK2-dependent cytoplasmic sequestration of Cdc25B/C to block mitotic entry in the presence of unrepaired DNA damage. Our findings demonstrate a critical role for the MK2 pathway in the post-transcriptional regulation of gene expression as part of the DNA damage response in cancer cells.
John Nash showed that within a complex system individuals are best off if they make the best decision that they can, taking into account the decisions of the other individuals. Here, we investigate if similar principles influence the evolution of signaling networks in multicellular animals. Specifically, by analyzing a set of metazoan species, we observe a striking negative correlation of genomically encoded tyrosine content with biological complexity (as measured by the number of cell types in each organism). We discuss how this observed tyrosine loss correlates with the expansion of tyrosine kinases in the evolution of the metazoan lineage and how it may relate to the optimization of signaling systems in multi-cellular animals. We propose that this phenomenon illustrates genome-wide adaptive evolution to accommodate beneficial genetic perturbation.
A combined computational and biochemical approach reveals how mitotic kinases allow cell division to proceed in the presence of DNA damage.
DNA damage checkpoints arrest cell cycle progression to facilitate DNA repair. The ability to survive genotoxic insults depends not only on the initiation of cell cycle checkpoints but also on checkpoint maintenance. While activation of DNA damage checkpoints has been studied extensively, molecular mechanisms involved in sustaining and ultimately inactivating cell cycle checkpoints are largely unknown. Here, we explored feedback mechanisms that control the maintenance and termination of checkpoint function by computationally identifying an evolutionary conserved mitotic phosphorylation network within the DNA damage response. We demonstrate that the non-enzymatic checkpoint adaptor protein 53BP1 is an in vivo target of the cell cycle kinases Cyclin-dependent kinase-1 and Polo-like kinase-1 (Plk1). We show that Plk1 binds 53BP1 during mitosis and that this interaction is required for proper inactivation of the DNA damage checkpoint. 53BP1 mutants that are unable to bind Plk1 fail to restart the cell cycle after ionizing radiation-mediated cell cycle arrest. Importantly, we show that Plk1 also phosphorylates the 53BP1-binding checkpoint kinase Chk2 to inactivate its FHA domain and inhibit its kinase activity in mammalian cells. Thus, a mitotic kinase-mediated negative feedback loop regulates the ATM-Chk2 branch of the DNA damage signaling network by phosphorylating conserved sites in 53BP1 and Chk2 to inactivate checkpoint signaling and control checkpoint duration.
DNA is constantly damaged both by factors outside our bodies (such as ultraviolet rays from sunlight) and by factors from within (such as reactive oxygen species produced during metabolism). DNA damage can lead to malfunctioning of genes, and persistent DNA damage can result in developmental disorders or the development of cancer. To ensure proper DNA repair, cells are equipped with an evolutionarily conserved DNA damage checkpoint, which stops proliferation and activates DNA repair mechanisms. Intriguingly, this DNA damage checkpoint responds to DNA damage throughout the cell cycle, except during mitosis. In this work, we have addressed how cells dismantle their DNA damage checkpoint during mitosis to allow cell division to proceed even if there is damaged DNA present. Using the observation that kinases phosphorylate their substrates on evolutionarily conserved, kinase-specific sequence motifs, we have used a combined computational and experimental approach to predict and verify key proteins involved in mitotic checkpoint inactivation. We show that the checkpoint scaffold protein 53BP1 is phosphorylated by the mitotic kinases Cdk1 and Polo-like kinase-1 (Plk1). Furthermore, we find that Plk1 can inactivate the checkpoint kinase Chk2, which is downstream of 53BP1. Plk1 is shown to be a key mediator of mitotic checkpoint inactivation, as cells that cannot activate Plk1 fail to properly dismantle the DNA damage checkpoint during mitosis and instead show DNA damage-induced Chk2 kinase activation. Two related papers, published in PLoS Biology (Vidanes et al., doi:10.1371/journal.pbio.1000286) and PLoS Genetics (Donnianni et al., doi:10.1371/journal.pgen.1000763), similarly investigate the phenomenon of DNA damage checkpoint silencing.
Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.
A key role of signal transduction pathways is to control transcriptional programs in the nucleus as a function of signals received by the cell via complex post-translational modification cascades. This determines cell-context specific responses to environmental stimuli. Given the difficulty of quantitating protein concentration and post-translational modifications, signaling pathway studies are still for the most part conducted one interaction at the time. Thus, genome-wide, cell-context specific dissection of signaling pathways is still an open challenge in molecular systems biology.
In this manuscript we extend the MINDy algorithm for the identification of post-translational modulators of transcription factor activity, to produce a first genome-wide map of the interface between signaling and transcriptional regulatory programs in human B cells. We show that the serine-threonine kinase STK38 emerges as the most pleiotropic signaling protein in this cellular context and we biochemically validate this finding by shRNA-mediated silencing of this kinase, followed by gene expression profile analysis. We also extensively validate the inferred interactions using protein-protein interaction databases and the kinase-substrate interaction prediction algorithm NetworKIN.
Protein kinases control cellular decision processes by phosphorylating specific substrates. Proteome-wide mapping has identified thousands of in vivo phosphorylation sites. However, systematically resolving which kinase targets each site is presently infeasible, due to the limited specificity of consensus motifs and the potential influence of contextual factors, such as protein scaffolds, localisation and expression, on cellular substrate specificity. We have therefore developed a computational method, NetworKIN, that augments motifs with context for kinases and phosphoproteins. This can pinpoint individual kinases responsible for specific in vivo phosphorylation events and yields a 2.5-fold improvement in the accuracy with which phosphorylation networks can be constructed. We show that context provides 60–80% of the computational capability to assign in vivo substrate specificity. Applying this approach to a DNA damage signalling network, we extend its cell-cycle regulation by showing that 53BP1 is a CDK1 substrate, show that Rad50 is phosphorylated by ATM kinase under genotoxic stress, and suggest novel roles of ATM in apoptosis. Finally, we present a scalable strategy to validate our predictions and use it to support the prediction that BCLAF1 is a GSK3 substrate.
Cellular signaling networks have evolved to enable swift and accurate responses, even in the face of genetic or environmental perturbation. Thus, genetic screens may not identify all the genes that regulate different biological processes. Moreover, although classical screening approaches have succeeded in providing parts lists of the essential components of signaling networks, they typically do not provide much insight into the hierarchical and functional relations that exist among these components. We describe a high-throughput screen in which we used RNA interference to systematically inhibit two genes simultaneously in 17,724 combinations to identify regulators of Drosophila JUN NH2-terminal kinase (JNK). Using both genetic and phosphoproteomics data, we then implemented an integrative network algorithm to construct a JNK phosphorylation network, which provides structural and mechanistic insights into the systems architecture of JNK signaling.
Protein kinases control cellular responses by phosphorylating specific substrates. Recent proteome-wide mapping of protein phosphorylation sites by mass spectrometry has discovered thousands of in vivo sites. Systematically assigning all 518 human kinases to all these sites is a challenging problem. The NetworKIN database (http://networkin.info) integrates consensus substrate motifs with context modelling for improved prediction of cellular kinase–substrate relations. Based on the latest human phosphoproteome from the Phospho.ELM and PhosphoSite databases, the resource offers insight into phosphorylation-modulated interaction networks. Here, we describe how NetworKIN can be used for both global and targeted molecular studies. Via the web interface users can query the database of precomputed kinase–substrate relations or obtain predictions on novel phosphoproteins. The database currently contains a predicted phosphorylation network with 20 224 site-specific interactions involving 3978 phosphoproteins and 73 human kinases from 20 families.
WW domains are protein modules that mediate protein-protein interactions through recognition of proline-rich peptide motifs and phosphorylated serine/threonine-proline sites. To pursue the functional properties of WW domains, we employed mass spectrometry to identify 148 proteins that associate with 10 human WW domains. Many of these proteins represent novel WW domain-binding partners and are components of multiprotein complexes involved in molecular processes, such as transcription, RNA processing, and cytoskeletal regulation. We validated one complex in detail, showing that WW domains of the AIP4 E3 protein-ubiquitin ligase bind directly to a PPXY motif in the p68 subunit of pre-mRNA cleavage and polyadenylation factor Im in a manner that promotes p68 ubiquitylation. The tested WW domains fall into three broad groups on the basis of hierarchical clustering with respect to their associated proteins; each such cluster of bound proteins displayed a distinct set of WW domain-binding motifs. We also found that separate WW domains from the same protein or closely related proteins can have different specificities for protein ligands and also demonstrated that a single polypeptide can bind multiple classes of WW domains through separate proline-rich motifs. These data suggest that WW domains provide a versatile platform to link individual proteins into physiologically important networks.
Post-translational phosphorylation is one of the most common protein modifications. Phosphoserine, threonine and tyrosine residues play critical roles in the regulation of many cellular processes. The fast growing number of research reports on protein phosphorylation points to a general need for an accurate database dedicated to phosphorylation to provide easily retrievable information on phosphoproteins.
Phospho.ELM is a new resource containing experimentally verified phosphorylation sites manually curated from the literature and is developed as part of the ELM (Eukaryotic Linear Motif) resource. Phospho.ELM constitutes the largest searchable collection of phosphorylation sites available to the research community. The Phospho.ELM entries store information about substrate proteins with the exact positions of residues known to be phosphorylated by cellular kinases. Additional annotation includes literature references, subcellular compartment, tissue distribution, and information about the signaling pathways involved as well as links to the molecular interaction database MINT. Phospho.ELM version 2.0 contains 1703 phosphorylation site instances for 556 phosphorylated proteins.
Phospho.ELM will be a valuable tool both for molecular biologists working on protein phosphorylation sites and for bioinformaticians developing computational predictions on the specificity of phosphorylation reactions.
post-transcriptional modification; protein kinase; bioinformatics
SRS (Sequence Retrieval System) is a widely used keyword search engine for querying biological databases. BLAST2 is the most widely used tool to query databases by sequence similarity search. These tools allow users to retrieve sequences by shared keyword or by shared similarity, with many public web servers available. However, with the increasingly large datasets available it is now quite common that a user is interested in some subset of homologous sequences but has no efficient way to restrict retrieval to that set. By allowing the user to control SRS from the BLAST output, BLAST2SRS (http://blast2srs.embl.de/) aims to meet this need. This server therefore combines the two ways to search sequence databases: similarity and keyword.
Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensive collection of small functional sites/motifs comparable to the globular domain resources, yet these are as important for the function of multidomain proteins. Short linear peptide motifs are used for cell compartment targeting, protein–protein interaction, regulation by phosphorylation, acetylation, glycosylation and a host of other post-translational modifications. ELM, the Eukaryotic Linear Motif server at http://elm.eu.org/, is a new bioinformatics resource for investigating candidate short non-globular functional motifs in eukaryotic proteins, aiming to fill the void in bioinformatics tools. Sequence comparisons with short motifs are difficult to evaluate because the usual significance assessments are inappropriate. Therefore the server is implemented with several logical filters to eliminate false positives. Current filters are for cell compartment, globular domain clash and taxonomic range. In favourable cases, the filters can reduce the number of retained matches by an order of magnitude or more.
A major challenge in the proteomics and structural genomics era is to predict protein structure and function, including identification of those proteins that are partially or wholly unstructured. Non-globular sequence segments often contain short linear peptide motifs (e.g. SH3-binding sites) which are important for protein function. We present here a new tool for discovery of such unstructured, or disordered regions within proteins. GlobPlot (http://globplot.embl.de) is a web service that allows the user to plot the tendency within the query protein for order/globularity and disorder. We show examples with known proteins where it successfully identifies inter-domain segments containing linear motifs, and also apparently ordered regions that do not contain any recognised domain. GlobPlot may be useful in domain hunting efforts. The plots indicate that instances of known domains may often contain additional N- or C-terminal segments that appear ordered. Thus GlobPlot may be of use in the design of constructs corresponding to globular proteins, as needed for many biochemical studies, particularly structural biology. GlobPlot has a pipeline interface—GlobPipe—for the advanced user to do whole proteome analysis. GlobPlot can also be used as a generic infrastructure package for graphical displaying of any possible propensity.
Many aspects of cell signalling, trafficking, and targeting are governed by interactions between globular protein domains and short peptide segments. These domains often bind multiple peptides that share a common sequence pattern, or “linear motif” (e.g., SH3 binding to PxxP). Many domains are known, though comparatively few linear motifs have been discovered. Their short length (three to eight residues), and the fact that they often reside in disordered regions in proteins makes them difficult to detect through sequence comparison or experiment. Nevertheless, each new motif provides critical molecular details of how interaction networks are constructed, and can explain how one protein is able to bind to very different partners. Here we show that binding motifs can be detected using data from genome-scale interaction studies, and thus avoid the normally slow discovery process. Our approach based on motif over-representation in non-homologous sequences, rediscovers known motifs and predicts dozens of others. Direct binding experiments reveal that two predicted motifs are indeed protein-binding modules: a DxxDxxxD protein phosphatase 1 binding motif with a KD of 22 μM and a VxxxRxYS motif that binds Translin with a KD of 43 μM. We estimate that there are dozens or even hundreds of linear motifs yet to be discovered that will give molecular insight into protein networks and greatly illuminate cellular processes.
Many protein interactions are mediated by short amino acid motifs. The authors describe a new approach to identify these interaction motifs and experimentally validate some of their binding predictions.