We present a general probabilistic framework for predicting the substrate specificity of enzymes. We designed this approach to be easily applicable to different organisms and enzymes. Therefore, our predictive models do not rely on species-specific properties and use mostly sequence-derived data. Maximum Likelihood optimization is used to fine-tune model parameters and the Akaike Information Criterion is employed to overcome the issue of correlated variables. As a proof-of-principle, we apply our approach to predicting general substrate specificity of yeast methyltransferases (MTases). As input, we use several physico-chemical and biological properties of MTases: structural fold, isoelectric point, expression pattern and cellular localization. Our method accurately predicts whether a yeast MTase methylates a protein, RNA or another molecule. Among our experimentally tested predictions, 89% were confirmed, including the surprising prediction that YOR021C is the first known MTase with a SPOUT fold that methylates a substrate other than RNA (protein). Our approach not only allows for highly accurate prediction of functional specificity of MTases, but also provides insight into general rules governing MTase substrate specificity.
Our approach is easily applicable to different organisms, because it does not rely on species-specific properties and uses mostly sequence-derived and other readily available data (e.g. isoelectric point or predicted structural fold). Tests on yeast MTases indicate that the accuracy of our predictions is ∼90%. We show that knowledge of substrate binding sites or corresponding motifs is not crucial for highly accurate general substrate specificity predictions of enzymes, and provide new insights into how such specificities are achieved at the molecular level. We predict substrate specificities not yet observed for a given class of enzymes, and experimentally verify our predictions.
The Acel_2062 protein from Acidothermus cellulolyticus is a protein of unknown function. Initial sequence analysis predicted that it was a metallopeptidase from the presence of a motif conserved amongst the Asp-zincins, which are peptidases that contain a single, catalytic zinc ion ligated by the histidines and aspartic acid within the motif (HEXXHXXGXXD). The Acel_2062 protein was chosen by the Joint Center for Structural Genomics for crystal structure determination to explore novel protein sequence space and structure-based function annotation.
The crystal structure confirmed that the Acel_2062 protein consisted of a single, zincin-like metallopeptidase-like domain. The Met-turn, a structural feature thought to be important for a Met-zincin because it stabilizes the active site, is absent, and its stabilizing role may have been conferred to the C-terminal Tyr113. In our crystallographic model there are two molecules in the asymmetric unit and from size-exclusion chromatography, the protein dimerizes in solution. A water molecule is present in the putative zinc-binding site in one monomer, which is replaced by one of two observed conformations of His95 in the other.
The Acel_2062 protein is structurally related to the zincins. It contains the minimum structural features of a member of this protein superfamily, and can be described as a “mini- zincin”. There is a striking parallel with the structure of a mini-Glu-zincin, which represents the minimum structure of a Glu-zincin (a metallopeptidase in which the third zinc ligand is a glutamic acid). Rather than being an ancestral state, phylogenetic analysis suggests that the mini-zincins are derived from larger proteins.
Acel_2062; Metallopeptidase; Zincin; JCSG; Structural genomics
Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.
A novel highly conserved protein domain, DUF162 [Pfam: PF02589], can be mapped to two proteins: LutB and LutC. Both proteins are encoded by a highly conserved LutABC operon, which has been implicated in lactate utilization in bacteria. Based on our analysis of its sequence, structure, and recent experimental evidence reported by other groups, we hereby redefine DUF162 as the LUD domain family.
JCSG solved the first crystal structure [PDB:2G40] from the LUD domain family: LutC protein, encoded by ORF DR_1909, of Deinococcus radiodurans. LutC shares features with domains in the functionally diverse ISOCOT superfamily. We have observed that the LUD domain has an increased abundance in the human gut microbiome.
We propose a model for the substrate and cofactor binding and regulation in LUD domain. The significance of LUD-containing proteins in the human gut microbiome, and the implication of lactate metabolism in the radiation-resistance of Deinococcus radiodurans are discussed.
LUD; DUF162; LutB; LutC; Domain of unknown function; Deinococcus radiodurans
This study uses the Pfam database to show that the sequence redundancy of protein structures deposited in the PDB is increasing. The possible reasons behind this trend are discussed.
High-resolution structural knowledge is key to understanding how proteins function at the molecular level. The number of entries in the Protein Data Bank (PDB), the repository of all publicly available protein structures, continues to increase, with more than 8000 structures released in 2012 alone. The authors of this article have studied how structural coverage of the protein-sequence space has changed over time by monitoring the number of Pfam families that acquired their first representative structure each year from 1976 to 2012. Twenty years ago, for every 100 new PDB entries released, an estimated 20 Pfam families acquired their first structure. By 2012, this decreased to only about five families per 100 structures. The reasons behind the slower pace at which previously uncharacterized families are being structurally covered were investigated. It was found that although more than 50% of current Pfam families are still without a structural representative, this set is enriched in families that are small, functionally uncharacterized or rich in problem features such as intrinsically disordered and transmembrane regions. While these are important constraints, the reasons why it may not yet be time to give up the pursuit of a targeted but more comprehensive structural coverage of the protein-sequence space are discussed.
Pfam families; structural coverage; protein-sequence space
Recent years have seen the establishment of structural genomics centers that explicitly target integral membrane proteins. Here, we review the advances in targeting these extremely high-hanging fruits of structural biology in high-throughput mode. We observe that the experimental determination of high-resolution structures of integral membrane proteins is increasingly successful both in terms of getting structures and of covering important protein families, e.g. from Pfam. Structural genomics has begun to contribute significantly toward this progress. An important component of this contribution is the set up of robotic pipelines that generate a wealth of experimental data for membrane proteins. We argue that prediction methods for the identification of membrane regions and for the comparison of membrane proteins largely suffice to meet the challenges of target selection for structural genomics of membrane proteins. In contrast, we need better methods to prioritize the most promising members in a family of closely related proteins and to annotate protein function from sequence and structure in absence of homology.
alpha-helical integral membrane proteins; structural genomics; protein families; protein structure; target selection; function prediction
It is a worthy goal to completely characterize all human proteins in terms of their domains. Here, using the Pfam database, we asked how far we have progressed in this endeavour. Ninety per cent of proteins in the human proteome matched at least one of 5494 manually curated Pfam-A families. In contrast, human residue coverage by Pfam-A families was <45%, with 9418 automatically generated Pfam-B families adding a further 10%. Even after excluding predicted signal peptide regions and short regions (<50 consecutive residues) unlikely to harbour new families, for ∼38% of the human protein residues, there was no information in Pfam about conservation and evolutionary relationship with other protein regions. This uncovered portion of the human proteome was found to be distributed over almost 25 000 distinct protein regions. Comparison with proteins in the UniProtKB database suggested that the human regions that exhibited similarity to thousands of other sequences were often either divergent elements or N- or C-terminal extensions of existing families. Thirty-four per cent of regions, on the other hand, matched fewer than 100 sequences in UniProtKB. Most of these did not appear to share any relationship with existing Pfam-A families, suggesting that thousands of new families would need to be generated to cover them. Also, these latter regions were particularly rich in amino acid compositional bias such as the one associated with intrinsic disorder. This could represent a significant obstacle toward their inclusion into new Pfam families. Based on these observations, a major focus for increasing Pfam coverage of the human proteome will be to improve the definition of existing families. New families will also be built, prioritizing those that have been experimentally functionally characterized.
Database URL: http://pfam.sanger.ac.uk/
Detection of protein homology via sequence similarity has important applications in biology, from protein structure and function prediction to reconstruction of phylogenies. Although current methods for aligning protein sequences are powerful, challenges remain, including problems with homologous overextension of alignments and with regions under convergent evolution. Here, we test the ability of the profile hidden Markov model method HMMER3 to correctly assign homologous sequences to >13 000 manually curated families from the Pfam database. We identify problem families using protein regions that match two or more Pfam families not currently annotated as related in Pfam. We find that HMMER3 E-value estimates seem to be less accurate for families that feature periodic patterns of compositional bias, such as the ones typically observed in coiled-coils. These results support the continued use of manually curated inclusion thresholds in the Pfam database, especially on the subset of families that have been identified as problematic in experiments such as these. They also highlight the need for developing new methods that can correct for this particular type of compositional bias.
We have identified a new protein domain, which we have named the SHOCT domain (Short C-terminal domain). This domain is widespread in bacteria with over a thousand examples. But we found it is missing from the most commonly studied model organisms, despite being present in closely related species. It's predominantly C-terminal location, co-occurrence with numerous other domains and short size is reminiscent of the Gram-positive anchor motif, however it is present in a much wider range of species. We suggest several hypotheses about the function of SHOCT, including oligomerisation and nucleic acid binding. Our initial experiments do not support its role as an oligomerisation domain.
The plant SLAC1 anion channel controls turgor pressure in the aperture-defining guard cells of plant stomata, thereby regulating exchange of water vapor and photosynthetic gases in response to environmental signals such as drought or high levels of carbon dioxide. We determined the crystal structure of a bacterial homolog of SLAC1 at 1.20Å resolution, and we have used structure-inspired mutagenesis to analyze the conductance properties of SLAC1 channels. SLAC1 is a symmetric trimer composed from quasi-symmetric subunits, each having ten transmembrane helices arranged from helical hairpin pairs to form a central five-helix transmembrane pore that is gated by an extremely conserved phenylalanine residue. Conformational features suggest a mechanism for control of gating by kinase activation, and electrostatic features of the pore coupled with electrophysiological characteristics suggest that selectivity among different anions is largely a function of the energetic cost of ion dehydration.
Hepatitis C virus (HCV) p7 is a membrane-associated oligomeric protein harboring ion channel activity. It is essential for effective assembly and release of infectious HCV particles and an attractive target for antiviral intervention. Yet, the self-assembly and molecular mechanism of p7 ion channelling are currently only partially understood. Using molecular dynamics simulations (aggregate time 1.2 µs), we show that p7 can form stable oligomers of four to seven subunits, with a bias towards six or seven subunits, and suggest that p7 self-assembles in a sequential manner, with tetrameric and pentameric complexes forming as intermediate states leading to the final hexameric or heptameric assembly. We describe a model of a hexameric p7 complex, which forms a transiently-open channel capable of conducting ions in simulation. We investigate the ability of the hexameric model to flexibly rearrange to adapt to the local lipid environment, and demonstrate how this model can be reconciled with low-resolution electron microscopy data. In the light of these results, a view of p7 oligomerization is proposed, wherein hexameric and heptameric complexes may coexist, forming minimalist, yet robust functional ion channels. In the absence of a high-resolution p7 structure, the models presented in this paper can prove valuable as a substitute structure in future studies of p7 function, or in the search for p7-inhibiting drugs.
Hepatitis C remains a serious global health problem affecting more than 2% of the world's population, and current therapies are effective in only a subset of patients, necessitating an ongoing search for new treatments. The p7 viroporin is considered to be an attractive possible drug target, but rational drug design is hampered by the absence of a high-resolution p7 structure. In this paper, we explore possible structures of oligomeric p7 channels, and discuss the strengths and shortcomings of these models with respect to experimentally determined properties, such as pore-lining residues, ion conductance, and compatibility with low-resolution electron microscopy images. Our results present an image of p7 as a rudimentary, minimalistic ion channel, capable of existing in multiple oligomeric states but exhibiting a bias towards hexamers and heptamers. We believe that the work presented here will be valuable for future research by providing plausible 3-dimensional atomic-resolution models for the visualization of the p7 viroporin and serve as a basis for future computational studies.
Bacterial chemoreceptors provide an important model for understanding signalling processes. In the serine receptor Tsr from E. coli, a binding event in the periplasmic domain of the receptor dimer causes a shift in a single transmembrane helix of roughly 0.15 nm towards the cytoplasm. This small change is propagated through the ∼22 nm length of the receptor, causing downstream inhibition of the kinase CheA. This requires interactions within a trimer of receptor dimers. Additionally, the signal is amplified across a 53,000 nm2 array of chemoreceptor proteins, including ∼5,200 receptor trimers-of-dimers, at the cell pole. Despite a wealth of experimental data on the system, including high resolution structures of individual domains and extensive mutagenesis data, it remains uncertain how information is communicated across the receptor from the binding event to the downstream effectors. We present a molecular model of the entire Tsr dimer, and examine its behaviour using coarse-grained molecular dynamics and elastic network modelling. We observe a large bending in dimer models between the linker domain HAMP and coiled-coil domains, which is supported by experimental data. Models of the trimer of dimers, built from the dimer models, are more constrained and likely represent the signalling state. Simulations of the models in a 70 nm diameter vesicle with a biologically realistic lipid mixture reveal specific lipid interactions and oligomerisation of the trimer of dimers. The results indicate a mechanism whereby small motions of a single helix can be amplified through HAMP domain packing, to initiate large changes in the whole receptor structure.
To understand cell signalling events requires a physical model of the structure and behaviour of the signalling proteins involved. The methyl-accepting chemoreceptor proteins direct bacterial movement towards food sources and away from toxins. Based on experimental data we have built structural models of the serine chemoreceptor (Tsr) as a dimer, which is incapable of activating the downstream kinase CheA, and as a trimer of dimers, which can activate CheA. We have performed molecular dynamics simulation to reveal the behaviour of these two forms in a planar lipid bilayer and in a 70 nm diameter lipid vesicle with a mixture of lipids mimicking the E. coli inner membrane. We show that in isolation the dimers undergo a bending movement around the central HAMP domain, whereas the trimer-of-dimers model does not. Comparison with published experimental data suggests that these bending motions are real, and that they occur in the trimer of dimers only in response to ligand binding. Drawing together these observations with studies showing that the signalling event involves small piston motions in the transmembrane helices suggests that the bending motion is frustrated in the unliganded trimer of dimers, and that ligand binding induces bending by repacking the HAMP interface.
Nitric oxide reductases (NORs) are membrane proteins that catalyze the reduction of nitric oxide (NO) to nitrous oxide (N2O), which is a critical step of the nitrate respiration process in denitrifying bacteria. Using the recently determined first crystal structure of the cytochrome c-dependent NOR (cNOR) [Hino T, Matsumoto Y, Nagano S, Sugimoto H, Fukumori Y, et al. (2010) Structural basis of biological N2O generation by bacterial nitric oxide reductase. Science 330: 1666–70.], we performed extensive all-atom molecular dynamics (MD) simulations of cNOR within an explicit membrane/solvent environment to fully characterize water distribution and dynamics as well as hydrogen-bonded networks inside the protein, yielding the atomic details of functionally important proton channels. Simulations reveal two possible proton transfer pathways leading from the periplasm to the active site, while no pathways from the cytoplasmic side were found, consistently with the experimental observations that cNOR is not a proton pump. One of the pathways, which was newly identified in the MD simulation, is blocked in the crystal structure and requires small structural rearrangements to allow for water channel formation. That pathway is equivalent to the functional periplasmic cavity postulated in cbb3 oxidase, which illustrates that the two enzymes share some elements of the proton transfer mechanisms and confirms a close evolutionary relation between NORs and C-type oxidases. Several mechanisms of the critical proton transfer steps near the catalytic center are proposed.
Denitrification is an anaerobic process performed by several bacteria as an alternative to aerobic respiration. A key intermediate step is catalyzed by the nitric oxide reductase (NOR) enzyme, which is situated in the cytoplasmic membrane. Proton delivery to the catalytic site inside NOR is an important part of its functioning. In this work we use molecular dynamics simulations to describe water distribution and to identify proton transfer pathways in cNOR. Our results reveal two channels from the periplasmic side of the membrane and none from the cytoplasmic side, indicating that cNOR is not a proton pump. It is our hope that these results will provide a basis for further experimental and computational studies aimed to understand details of the NOR mechanism. Furthermore, this work sheds light on the molecular evolution of respiratory enzymes.
The soluble monomeric domain of lipoprotein YxeF from the Gram positive bacterium B. subtilis was selected by the Northeast Structural Genomics Consortium (NESG) as a target of a biomedical theme project focusing on the structure determination of the soluble domains of bacterial lipoproteins. The solution NMR structure of YxeF reveals a calycin fold and distant homology with the lipocalin Blc from the Gram-negative bacterium E.coli. In particular, the characteristic β-barrel, which is open to the solvent at one end, is extremely well conserved in YxeF with respect to Blc. The identification of YxeF as the first lipocalin homologue occurring in a Gram-positive bacterium suggests that lipocalins emerged before the evolutionary divergence of Gram positive and Gram negative bacteria. Since YxeF is devoid of the α-helix that packs in all lipocalins with known structure against the β-barrel to form a second hydrophobic core, we propose to introduce a new lipocalin sub-family named ‘slim lipocalins’, with YxeF and the other members of Pfam family PF11631 to which YxeF belongs constituting the first representatives. The results presented here exemplify the impact of structural genomics to enhance our understanding of biology and to generate new biological hypotheses.
As the deluge of genomic DNA sequence grows the fraction of protein sequences that have been manually curated falls. In turn, as the number of laboratories with the ability to sequence genomes in a high-throughput manner grows, the informatics capability of those labs to accurately identify and annotate all genes within a genome may often be lacking. These issues have led to fears about transitive annotation errors making sequence databases less reliable. During the lifetime of the Pfam protein families database a number of protein families have been built, which were later identified as composed solely of spurious open reading frames (ORFs) either on the opposite strand or in a different, overlapping reading frame with respect to the true protein-coding or non-coding RNA gene. These families were deleted and are no longer available in Pfam. However, we realized that these may perform a useful function to identify new spurious ORFs. We have collected these families together in AntiFam along with additional custom-made families of spurious ORFs. This resource currently contains 23 families that identified 1310 spurious proteins in UniProtKB and a further 4119 spurious proteins in a collection of metagenomic sequences. UniProt has adopted AntiFam as a part of the UniProtKB quality control process and will investigate these spurious proteins for exclusion.
The identification of orthologs—genes pairs descended from a common ancestor through speciation, rather than duplication—has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second ‘Quest for Orthologs’ meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.
Pfam is a widely used database of protein families, currently containing more than 13 000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the ‘sunburst’ representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
Saccharides play a central role in the nutrition of all living organisms. Whereas several saccharide uptake systems are shared between the different phylogenetic kingdoms, the phosphoenolpyruvate-dependent phosphotransferase system exists almost exclusively in bacteria. This multi-component system includes an integral membrane protein EIIC that transports saccharides and assists in their phosphorylation. Here we present the crystal structure of an EIIC from Bacillus cereus that transports diacetylchitobiose. The EIIC is a homodimer, with an expansive interface formed between the N-terminal halves of the two protomers. The C-terminal half of each protomer has a large binding pocket that contains a diacetylchitobiose, which is occluded from both sides of the membrane with its site of phosphorylation near the conserved His250 and Glu334 residues. The structure shows the architecture of this important class of transporters, identifies the determinants of substrate binding and phosphorylation, and provides a framework for understanding the mechanism of sugar translocation.
The TrkH/TrkG/KtrB proteins mediate K+ uptake in bacteria and likely evolved from simple K+ channels by multiple gene duplications or fusions. Here we present the crystal structure of a TrkH from Vibrio parahaemolyticus. TrkH is a homodimer, and each protomer contains an ion permeation pathway. A selectivity filter, similar in architecture to those of K+ channels but significantly shorter, is lined by backbone and side chain oxygen atoms. Functional studies showed that the TrkH allows permeation of K+ and Rb+ but not smaller ions such as Na+ or Li+. Immediately intracellular to the selectivity filter are an intramembrane loop and an arginine residue, both highly conserved, which constrict the permeation pathway. Substituting the arginine with an alanine significantly increases the rate of K+ flux. These results reveal the molecular basis of K+ selectivity and suggest a novel gating mechanism by this large and important family of membrane transport proteins.
The New York Consortium on Membrane Protein Structure (NYCOMPS) was formed to accelerate the acquisition of structural information on membrane proteins by applying a structural genomics approach. NY-COMPS comprises a bioinformatics group, a centralized facility operating a high-throughput cloning and screening pipeline, a set of associated wet labs that perform high-level protein production and structure determination by x-ray crystallography and NMR, and a set of investigators focused on methods development. In the first three years of operation, the NYCOMPS pipeline has so far produced and screened 7,250 expression constructs for 8,045 target proteins. Approximately 600 of these verified targets were scaled up to levels required for structural studies, so far yielding 24 membrane protein crystals. Here we describe the overall structure of NYCOMPS and provide details on the high-throughput pipeline.
Membrane proteins; Structural genomics; High throughput; NMR; X-ray
VPA0419; yiiS; PFAM 04175; structural genomics; GFT NMR
Intrinsically disordered proteins are predicted to be highly abundant and play broad biological roles in eukaryotic cells. In particular, by virtue of their structural malleability and propensity to interact with multiple binding partners, disordered proteins are thought to be specialized for roles in signaling and regulation. However, these concepts are based on in silico analyses of translated whole genome sequences, not on large-scale analyses of proteins expressed in living cells. Therefore, whether these concepts broadly apply to expressed proteins is currently unknown. Previous studies have shown that heat-treatment of cell extracts lead to partial enrichment of soluble, disordered proteins. Based on this observation, we sought to address the current dearth of knowledge about expressed, disordered proteins by performing a large-scale proteomics study of thermo-stable proteins isolated from mouse fibroblast cells. Using novel multidimensional chromatography methods and mass spectrometry, we identified a total of 1,320 thermo-stable proteins from these cells. Further, we used a variety of bioinformatics methods to analyze the structural and biological properties of these proteins. Interestingly, more than 900 of these expressed proteins were predicted to be substantially disordered. These were divided into two categories, with 514 predicted to be predominantly disordered and 395 predicted to exhibit both disordered and ordered/folded features. In addition, 411 of the thermo-stable proteins were predicted to be folded. Despite the use of heat treatment (60 min. at 98 °C) to partially enrich for disordered proteins, which might have been expected to select for small proteins, the sequences of these proteins exhibited a wide range of lengths (622 ± 555 residues (average length ± standard deviation) for disordered proteins and 569 ± 598 residues for folded proteins). Computational structural analyses revealed several unexpected features of the thermo-stable proteins: 1) disordered domains and coiled-coil domains occurred together in a large number of disordered proteins, suggesting functional interplay between these domains, and 2) more than 170 proteins contained lengthy domains (>300 residues) known to be folded. Reference to Gene Ontology Consortium functional annotations revealed that, while disordered proteins play diverse biological roles in mouse fibroblasts, they do exhibit heightened involvement in several functional categories, including, cytoskeletal structure and cell movement, metabolic and biosynthetic processes, organelle structure, cell division, gene transcription, and ribonucleoprotein complexes. We believe that these results reflect the general properties of the mouse intrinsically disordered proteome (IDP-ome) although they also reflect the specialized physiology of fibroblast cells. Large-scale identification of expressed, thermo-stable proteins from other cell types in the future, grown under varied physiological conditions, will dramatically expand our understanding of the structural and biological properties of disordered eukaryotic proteins.
intrinsically disordered proteins; intrinsically unstructured proteins; proteomics; mammalian proteome; thermo-stable proteins