Enzymes belonging to the same super family of proteins in general operate on variety of substrates and are inhibited by wide selection of inhibitors. In this work our main objective was to expand the scope of studies that consider only the catalytic and binding pocket amino acids while analyzing enzyme specificity and instead, include a wider category which we have named the Interface Forming Residues (IFR). We were motivated to identify those amino acids with decreased accessibility to solvent after docking of different types of inhibitors to sub classes of serine proteases and then create a table (matrix) of all amino acid positions at the interface as well as their respective occupancies. Our goal is to establish a platform for analysis of the relationship between IFR characteristics and binding properties/specificity for bi-molecular complexes.
We propose a novel method for describing binding properties and delineating serine proteases specificity by compiling an exhaustive table of interface forming residues (IFR) for serine proteases and their inhibitors. Currently, the Protein Data Bank (PDB) does not contain all the data that our analysis would require. Therefore, an in silico approach was designed for building corresponding complexes
The IFRs are obtained by "rigid body docking" among 70 structurally aligned, sequence wise non-redundant, serine protease structures with 3 inhibitors: bovine pancreatic trypsin inhibitor (BPTI), ecotine and ovomucoid third domain inhibitor. The table (matrix) of all amino acid positions at the interface and their respective occupancy is created. We also developed a new computational protocol for predicting IFRs for those complexes which were not deciphered experimentally so far, achieving accuracy of at least 0.97.
The serine proteases interfaces prefer polar (including glycine) residues (with some exceptions). Charged residues were found to be uniquely prevalent at the interfaces between the "miscellaneous-virus" subfamily and the three inhibitors. This prompts speculation about how important this difference in IFR characteristics is for maintaining virulence of those organisms.
Our work here provides a unique tool for both structure/function relationship analysis as well as a compilation of indicators detailing how the specificity of various serine proteases may have been achieved and/or could be altered. It also indicates that the interface forming residues which also determine specificity of serine protease subfamily can not be presented in a canonical way but rather as a matrix of alternative populations of amino acids occupying variety of IFR positions.
Serine proteases account for over a third of all known proteolytic enzymes; they are involved in a variety of physiological processes and are classified into clans sharing structural homology. The PA clan of endopeptidases is the most abundant and over two thirds of this clan is comprised of the S1 family of serine proteases, which bear the archetypal trypsin fold and have a catalytic triad in the order Histidine, Aspartate, Serine. These proteases have been studied in depth and many three dimensional structures have been experimentally determined. However, these structures mostly consist of bacterial and animal proteases, with a small number of plant and fungal proteases and as yet no structures have been determined for protozoa or archaea. The core structure and active site geometry of these proteases is of interest for many applications. This study investigated the structural properties of different S1 family serine proteases from a diverse range of taxa using molecular modeling techniques.
Our predicted models from protozoa, archaea, fungi and plants were combined with the experimentally determined structures of 16 S1 family members and used for analysis of the catalytic core. Amino acid sequences were submitted to SWISS-MODEL for homology-based structure prediction or the LOOPP server for threading-based structure prediction. Predicted models were refined using INSIGHT II and SCRWL and validated against experimental structures. Investigation of secondary structures and electrostatic surface potential was performed using MOLMOL. The structural geometry of the catalytic core shows clear deviations between taxa, but the relative positions of the catalytic triad residues were conserved. Some highly conserved residues potentially contributing to the stability of the structural core were identified. Evolutionary divergence was also exhibited by large variation in secondary structure features outside the core, differences in overall amino acid distribution, and unique surface electrostatic potential patterns between species.
Encompassing a wide range of taxa, our structural analysis provides an evolutionary perspective on S1 family serine proteases. Focusing on the common core containing the catalytic site of the enzyme, this analysis is beneficial for future molecular modeling strategies and structural analysis of serine protease models.
Serine protease; PA clan; Homology; Threading; Modeling
Autotransporter (AT) is a protein secretion pathway found in Gram-negative bacteria featuring a multidomain polypeptide with a signal sequence, a passenger domain, and a translocator domain. An AT subfamily named serine protease ATs of the family Enterobacteriaceae (SPATEs) is characterized by the presence of a conserved serine protease motif in the passenger domain which contributes to bacterial pathogenesis. The goal of the current study is to determine the importance of the passenger domain conserved residues in the SPATE proteolytic and adhesive functions using the temperature-sensitive hemagglutinin (Tsh) protein as our model. To begin, mutations of 21 fully conserved residues in the four passenger domain conserved motifs were constructed by PCR-based site-directed mutagenesis. Seventeen mutants exhibited a wild-type secretion level; among these mutants, eight displayed reduced proteolytic activities in Tsh-specific oligopeptide and mucin cleavage assays. These eight mutants also demonstrated lower affinities to extracellular matrix proteins, collagen IV, and fibronectin. These eight conserved residues were analyzed by molecular graphics modeling to demonstrate their intramolecular interactions with the catalytic triad and other key residues. Additional mutations were made to confirm the above interactions in order to demonstrate their significance to the SPATE functions. Altogether our data suggest that certain conserved residues in the SPATE passenger domain are important for both the proteolytic and adhesive activities of SPATE by maintaining the proper protein structure via intramolecular interactions between the protease and β-helical domains. Here, we provide new insight into the structure-function relationship of the SPATEs and the functional roles of their conserved residues.
Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards) are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions) that incorporates several pruning strategies to largely reduce the mining cost.
WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode of WildSpan is developed for discovering functional regions of a single protein by referring to a set of related sequences (e.g. its homologues). The discovered W-patterns are used to characterize the protein sequence and the results are compared with the conserved positions identified by multiple sequence alignment (MSA). The family-based mining mode of WildSpan is developed for extracting sequence signatures for a group of related proteins (e.g. a protein family) for protein function classification. In this situation, the discovered W-patterns are compared with PROSITE patterns as well as the patterns generated by three existing methods performing the similar task. Finally, analysis on execution time of running WildSpan reveals that the proposed pruning strategy is effective in improving the scalability of the proposed algorithm.
The mining results conducted in this study reveal that WildSpan is efficient and effective in discovering functional signatures of proteins directly from sequences. The proposed pruning strategy is effective in improving the scalability of WildSpan. It is demonstrated in this study that the W-patterns discovered by WildSpan provides useful information in characterizing protein sequences. The WildSpan executable and open source codes are available on the web (http://biominer.csie.cyu.edu.tw/wildspan).
The gene family of subtilisin-like serine proteases (subtilases) in Arabidopsis thaliana comprises 56 members, divided into six distinct subfamilies. Whereas the members of five subfamilies are similar to pyrolysins, two genes share stronger similarity to animal kexins. Mutant screens confirmed 144 T-DNA insertion lines with knockouts for 55 out of the 56 subtilases. Apart from SDD1, none of the confirmed homozygous mutants revealed any obvious visible phenotypic alteration during growth under standard conditions. Apart from this specific case, forward genetics gave us no hints about the function of the individual 54 non-characterized subtilase genes. Therefore, the main objective of our work was to overcome the shortcomings of the forward genetic approach and to infer alternative experimental approaches by using an integrative bioinformatics and biological approach. Computational analyses based on transcriptional co-expression and co-response pattern revealed at least two expression networks, suggesting that functional redundancy may exist among subtilases with limited similarity. Furthermore, two hubs were identified, which may be involved in signalling or may represent higher-order regulatory factors involved in responses to environmental cues. A particular enrichment of co-regulated genes with metabolic functions was observed for four subtilases possibly representing late responsive elements of environmental stress. The kexin homologs show stronger associations with genes of transcriptional regulation context. Based on the analyses presented here and in accordance with previously characterized subtilases, we propose three main functions of subtilases: involvement in (i) control of development, (ii) protein turnover, and (iii) action as downstream components of signalling cascades. Supplemental material is available in the Plant Subtilase Database (PSDB)
, as well as from the CSB.DB (http://csbdb.mpimp-golm.mpg.de).
The first complete plant genome sequence was available for Arabidopsis thaliana, a common weed. The number of genes in the Arabidopsis genome is estimated to be around 25,000. The functions of most of these gene are, however, still unknown. Many genes are grouped into gene families due to conserved sequences and predicted protein structures. In this article, the large subtilisin-like serine protease (subtilase) family of Arabidopsis is analysed. Although 56 subtilase genes have been identified in Arabidopsis, the function of only two subtilases is known. Analysis of mutants has revealed no further hints about the function of the other 54 subtilases. Here the authors present a novel approach to infer hypotheses about functions of the subtilase genes using computational analysis. Based on the analyses presented here and in accordance with previously characterized subtilases, they propose three main functions of subtilases: involvement in (i) control of development, (ii) protein degradation, and (iii) signalling. The results presented can be used to direct further analysis to elucidate functions of subtilases in plants.
Serine protease inhibitors act as modulators of serine proteases, playing important roles in protecting animal toxin peptides from degradation. However, all known serine protease inhibitors discovered thus far from animal venom belong to the Kunitz-type subfamily, and whether there are other novel types of protease inhibitors in animal venom remains unclear.
Here, by screening scorpion venom gland cDNA libraries, we identified the first Ascaris-type animal toxin family, which contains four members: Scorpiops jendeki Ascaris-type protease inhibitor (SjAPI), Scorpiops jendeki Ascaris-type protease inhibitor 2 (SjAPI-2), Chaerilus tricostatus Ascaris-type protease inhibitor (CtAPI), and Buthus martensii Ascaris-type protease inhibitor (BmAPI). The detailed characterization of Ascaris-type peptide SjAPI from the venom gland of scorpion Scorpiops jendeki was carried out. The mature peptide of SjAPI contains 64 residues and possesses a classical Ascaris-type cysteine framework reticulated by five disulfide bridges, different from all known protease inhibitors from venomous animals. Enzyme and inhibitor reaction kinetics experiments showed that recombinant SjAPI was a dual function peptide with α-chymotrypsin- and elastase-inhibiting properties. Recombinant SjAPI inhibited α-chymotrypsin with a Ki of 97.1 nM and elastase with a Ki of 3.7 μM, respectively. Bioinformatics analyses and chimera experiments indicated that SjAPI contained the unique short side chain functional residues “AAV” and might be a useful template to produce new serine protease inhibitors.
To our knowledge, SjAPI is the first functionally characterized animal toxin peptide with an Ascaris-type fold. The structural and functional diversity of animal toxins with protease-inhibiting properties suggested that bioactive peptides from animal venom glands might be a new source of protease inhibitors, which will accelerate the development of diagnostic and therapeutic agents for human diseases that target diverse proteases.
Protein dynamics and the underlying networks of intramolecular interactions and communicating residues within the three-dimensional (3D) structure are known to influence protein function and stability, as well as to modulate conformational changes and allostery. Acylaminoacyl peptidase (AAP) subfamily of enzymes belongs to a unique class of serine proteases, the prolyl oligopeptidase (POP) family, which has not been thoroughly investigated yet. POPs have a characteristic multidomain three-dimensional architecture with the active site at the interface of the C-terminal catalytic domain and a β-propeller domain, whose N-terminal region acts as a bridge to the hydrolase domain. In the present contribution, protein dynamics signatures of a hyperthermophilic acylaminoacyl peptidase (AAP) of the prolyl oligopeptidase (POP) family, as well as of a deletion variant and alanine mutants (I12A, V13A, V16A, L19A, I20A) are reported. In particular, we aimed at identifying crucial residues for long range communications to the catalytic site or promoting the conformational changes to switch from closed to open ApAAP conformations. Our investigation shows that the N-terminal α1-helix mediates structural intramolecular communication to the catalytic site, concurring to the maintenance of a proper functional architecture of the catalytic triad. Main determinants of the effects induced by α1-helix are a subset of hydrophobic residues (V16, L19 and I20). Moreover, a subset of residues characterized by relevant interaction networks or coupled motions have been identified, which are likely to modulate the conformational properties at the interdomain interface.
Function prediction by homology is widely used to provide preliminary functional annotations for genes for which experimental evidence of function is unavailable or limited. This approach has been shown to be prone to systematic error, including percolation of annotation errors through sequence databases. Phylogenomic analysis avoids these errors in function prediction but has been difficult to automate for high-throughput application. To address this limitation, we present a computationally efficient pipeline for phylogenomic classification of proteins. This pipeline uses the SCI-PHY (Subfamily Classification in Phylogenomics) algorithm for automatic subfamily identification, followed by subfamily hidden Markov model (HMM) construction. A simple and computationally efficient scoring scheme using family and subfamily HMMs enables classification of novel sequences to protein families and subfamilies. Sequences representing entirely novel subfamilies are differentiated from those that can be classified to subfamilies in the input training set using logistic regression. Subfamily HMM parameters are estimated using an information-sharing protocol, enabling subfamilies containing even a single sequence to benefit from conservation patterns defining the family as a whole or in related subfamilies. SCI-PHY subfamilies correspond closely to functional subtypes defined by experts and to conserved clades found by phylogenetic analysis. Extensive comparisons of subfamily and family HMM performances show that subfamily HMMs dramatically improve the separation between homologous and non-homologous proteins in sequence database searches. Subfamily HMMs also provide extremely high specificity of classification and can be used to predict entirely novel subtypes. The SCI-PHY Web server at http://phylogenomics.berkeley.edu/SCI-PHY/ allows users to upload a multiple sequence alignment for subfamily identification and subfamily HMM construction. Biologists wishing to provide their own subfamily definitions can do so. Source code is available on the Web page. The Berkeley Phylogenomics Group PhyloFacts resource contains pre-calculated subfamily predictions and subfamily HMMs for more than 40,000 protein families and domains at http://phylogenomics.berkeley.edu/phylofacts/.
Predicting the function of a gene or protein (gene product) from its primary sequence is a major focus of many bioinformatics methods. In this paper, the authors present a three-stage computational pipeline for gene functional annotation in an evolutionary framework to reduce the systematic errors associated with the standard protocol (annotation transfer from predicted homologs). In the first stage, a functional hierarchy is estimated for each protein family and subfamilies are identified. In the second stage, hidden Markov models (HMMs) (a type of statistical model) are constructed for each subfamily to model both the family-defining and subfamily-specific signatures. In the third stage, subfamily HMMs are used to assign novel sequences to functional subtypes. Extensive experimental validation of these methods shows that predicted subfamilies correspond closely to functional subtypes identified by experts and to conserved clades in phylogenetic trees; that subfamily HMMs increase the separation between homologs and non-homologs in sequence database discrimination tests relative to the use of a single HMM for the family; and that specificity of classification of novel sequences to subfamilies using subfamily HMMs is near perfect (1.5% error rate when sequences are assigned to the top-scoring subfamily, and <0.5% error rate when logistic regression of scores is employed).
A new extracellular protease (PoSl; Pleurotus ostreatus subtilisin-like protease) from P. ostreatus culture broth has been purified and characterized. PoSl is a monomeric glycoprotein with a molecular mass of 75 kDa, a pI of 4.5, and an optimum pH in the alkaline range. The inhibitory profile indicates that PoSl is a serine protease. The N-terminal and three tryptic peptide sequences of PoSl have been determined. The homology of one internal peptide with conserved sequence around the Asp residue of the catalytic triad in the subtilase family suggests that PoSl is a subtilisin-like protease. This hypothesis is further supported by the finding that PoSl hydrolysis sites of the insulin B chain match those of subtilisin. PoSl activity is positively affected by calcium. A 10-fold decrease in the Km value in the presence of calcium ions can reflect an induced structural change in the substrate recognition site region. Furthermore, Ca2+ binding slows PoSl autolysis, triggering the protein to form a more compact structure. These effects have already been observed for subtilisin and other serine proteases. Moreover, PoSl protease seems to play a key role in the regulation of P. ostreatus laccase activity by degrading and/or activating different isoenzymes.
The gene encoding serine alkaline protease (SapSh) of the psychrotrophic bacterium Shewanella strain Ac10 was cloned in Escherichia coli. The amino acid sequence deduced from the 2,442-bp nucleotide sequence revealed that the protein was 814 amino acids long and had an estimated molecular weight of 85,113. SapSh exhibited sequence similarities with members of the subtilisin family of proteases, and there was a high level of conservation in the regions around a putative catalytic triad consisting of Asp-30, His-65, and Ser-369. The amino acid sequence contained the following regions which were assigned on the basis of homology to previously described sequences: a signal peptide (26 residues), a propeptide (117 residues), and an extension up to the C terminus (about 250 residues). Another feature of SapSh is the fact that the space between His-65 and Ser-369 is approximately 150 residues longer than the corresponding spaces in other proteases belonging to the subtilisin family. SapSh was purified to homogeneity from the culture supernatant of E. coli recombinant cells by affinity chromatography with a bacitracin-Sepharose column. The recombinant SapSh (rSapSh) was found to have a molecular weight of about 44,000 and to be highly active in the alkaline region (optimum pH, around 9.0) when azocasein and synthetic peptides were used as substrates. rSapSh was characterized by its high levels of activity at low temperatures; it was five times more active than subtilisin Carlsberg at temperatures ranging from 5 to 15°C. The activation energy for hydrolysis of azocasein by rSapSh was much lower than the activation energy for hydrolysis of azocasein by the subtilisin. However, rSapSh was far less stable than the subtilisin.
Parasite proteases play key roles in several fundamental steps of the Plasmodium life cycle, including haemoglobin degradation, host cell invasion and parasite egress. Plasmodium exit from infected host cells appears to be mediated by a class of papain-like cysteine proteases called ‘serine repeat antigens’ (SERAs). A SERA subfamily, represented by Plasmodium falciparum SERA5, contains an atypical active site serine residue instead of a catalytic cysteine. Members of this SERAser subfamily are abundantly expressed in asexual blood stages, rendering them attractive drug and vaccine targets. In this study, we show by antibody localization and in vivo fluorescent tagging with the red fluorescent protein mCherry that the two P. berghei serine-type family members, PbSERA1 and PbSERA2, display differential expression towards the final stages of merozoite formation. Via targeted gene replacement, we generated single and double gene knockouts of the P. berghei SERAser genes. These loss-of-function lines progressed normally through the parasite life cycle, suggesting a specialized, non-vital role for serine-type SERAs in vivo. Parasites lacking PbSERAser showed increased expression of the cysteine-type PbSERA3. Compensatory mechanisms between distinct SERA subfamilies may thus explain the absence of phenotypical defect in SERAser disruptants, and challenge the suitability to develop potent antimalarial drugs based on specific inhibitors of Plasmodium serine-type SERAs.
Proteolytic enzymes have evolved several mechanisms to cleave peptide bonds. These distinct types have been systematically categorized in the MEROPS database. While a BLAST search on these proteases identifies homologous proteins, sequence alignment methods often fail to identify relationships arising from convergent evolution, exon shuffling, and modular reuse of catalytic units. We have previously established a computational method to detect functions in proteins based on the spatial and electrostatic properties of the catalytic residues (CLASP). CLASP identified a promiscuous serine protease scaffold in alkaline phosphatases (AP) and a scaffold recognizing a β-lactam (imipenem) in a cold-active Vibrio AP. Subsequently, we defined a methodology to quantify promiscuous activities in a wide range of proteins. Here, we assemble a module which encapsulates the multifarious motifs used by protease families listed in the MEROPS database. Since APs and proteases are an integral component of outer membrane vesicles (OMV), we sought to query other OMV proteins, like phospholipase C (PLC), using this search module. Our analysis indicated that phosphoinositide-specific PLC from Bacillus cereus is a serine protease. This was validated by protease assays, mass spectrometry and by inhibition of the native phospholipase activity of PI-PLC by the well-known serine protease inhibitor AEBSF (IC50 = 0.018 mM). Edman degradation analysis linked the specificity of the protease activity to a proline in the amino terminal, suggesting that the PI-PLC is a prolyl peptidase. Thus, we propose a computational method of extending protein families based on the spatial and electrostatic congruence of active site residues.
Aedes aegypti utilizes blood for energy production, egg maturation and replenishment of maternal reserves. The principle midgut enzymes responsible for bloodmeal digestion are endoproteolytic serine-type proteases within the S1.A subfamily. While there are hundreds of serine protease-like genes in the A. aegypti genome, only five are known to be expressed in the midgut. We describe the cloning, sequencing and expression profiling of seven additional serine proteases and provide a genomic and phylogenetic assessment of these findings. Of the seven genes, four are constitutively expressed and three are transcriptionally induced upon blood feeding. The amount of transcriptional induction is strongly correlated among these genes. Alignments reveal that, in general, the conserved catalytic triad, active site and accessory catalytic residues are maintained in these genes and phylogenetic analysis shows that these genes fall within three distinct clades; trypsins, chymotrypsins and serine collagenases. Interestingly, a previously described trypsin consistently arose with other serine collagenases in phylogenetic analyses. These results suggest that multiple gene duplications have arisen within the S1.A subfamily of midgut serine proteases and/or that A. aegypti has evolved an array of proteases with a broad range of substrate specificities for rapid, efficient digestion of bloodmeals.
Aedes aegypti; Midgut; Serine proteases
Hepatitis C virus (HCV) currently infects approximately three percent of the world population. In view of the lack of vaccines against HCV, there is an urgent need for an efficient treatment of the disease by an effective antiviral drug. Rational drug design has not been the primary way for discovering major therapeutics. Nevertheless, there are reports of success in the development of inhibitor using a structure-based approach. One of the possible targets for drug development against HCV is the NS3 protease variants. Based on the three-dimensional structure of these variants we expect to identify new NS3 protease inhibitors. In order to speed up the modeling process all NS3 protease variant models were generated in a Beowulf cluster. The potential of the structural bioinformatics for development of new antiviral drugs is discussed.
The atomic coordinates of crystallographic structure 1CU1 and 1DY9 were used as starting model for modeling of the NS3 protease variant structures. The NS3 protease variant structures are composed of six subdomains, which occur in sequence along the polypeptide chain. The protease domain exhibits the dual beta-barrel fold that is common among members of the chymotrypsin serine protease family. The helicase domain contains two structurally related beta-alpha-beta subdomains and a third subdomain of seven helices and three short beta strands. The latter domain is usually referred to as the helicase alpha-helical subdomain. The rmsd value of bond lengths and bond angles, the average G-factor and Verify 3D values are presented for NS3 protease variant structures.
This project increases the certainty that homology modeling is an useful tool in structural biology and that it can be very valuable in annotating genome sequence information and contributing to structural and functional genomics from virus. The structural models will be used to guide future efforts in the structure-based drug design of a new generation of NS3 protease variants inhibitors. All models in the database are publicly accessible via our interactive website, providing us with large amount of structural models for use in protein-ligand docking analysis.
Granzymes, a family of serine proteases, are expressed exclusively by cytotoxic T lymphocytes and natural killer (NK) cells, components of the immune system that protect higher organisms against viral infection and cellular transformation. Following receptor-mediated conjugate formation between a granzyme-containing cell and an infected or transformed target cell, granzymes enter the target cell via endocytosis and induce apoptosis. Granzyme B is the most powerful pro-apoptotic member of the granzyme family. Like caspases, cysteine proteases that play an important role in apoptosis, it can cleave proteins after acidic residues, especially aspartic acid. Other granzymes may serve additional functions, and some may not induce apoptosis. Granzymes have been well characterized only in human and rodents, and can be grouped into three subfamilies according to substrate specificity: members of the granzyme family that have enzymatic activity similar to the serine protease chymotrypsin are encoded by a gene cluster termed the 'chymase locus'; granzymes with trypsin-like specificities are encoded by the 'tryptase locus'; and a third subfamily cleaves after unbranched hydrophobic residues, especially methionine, and is encoded by the 'Met-ase locus'. All granzymes are synthesized as zymogens and, after clipping of the leader peptide, maximal enzymatic activity is achieved by removal of an amino-terminal dipeptide. They can all be blocked by serine protease inhibitors, and a new group of inhibitors has recently been identified - serpins, some of which are specific for granzymes. Future studies of serpins may bring insights into how cells that synthesize granzymes are protected from inadvertent cell suicide.
Computational methods are increasingly gaining importance as an aid in identifying active sites. Mostly these methods tend to have structural information that supplement sequence conservation based analyses. Development of tools that compute electrostatic potentials has further improved our ability to better characterize the active site residues in proteins. We have described a computational methodology for detecting active sites based on structural and electrostatic conformity - CataLytic Active Site Prediction (CLASP). In our pipelined model, physical 3D signature of any particular enzymatic function as defined by its active sites is used to obtain spatially congruent matches. While previous work has revealed that catalytic residues have large pKa deviations from standard values, we show that for a given enzymatic activity, electrostatic potential difference (PD) between analogous residue pairs in an active site taken from different proteins of the same family are similar. False positives in spatially congruent matches are further pruned by PD analysis where cognate pairs with large deviations are rejected. We first present the results of active site prediction by CLASP for two enzymatic activities - β-lactamases and serine proteases, two of the most extensively investigated enzymes. The results of CLASP analysis on motifs extracted from Catalytic Site Atlas (CSA) are also presented in order to demonstrate its ability to accurately classify any protein, putative or otherwise, with known structure. The source code and database is made available at www.sanchak.com/clasp/. Subsequently, we probed alkaline phosphatases (AP), one of the well known promiscuous enzymes, for additional activities. Such a search has led us to predict a hitherto unknown function of shrimp alkaline phosphatase (SAP), where the protein acts as a protease. Finally, we present experimental evidence of the prediction by CLASP by showing that SAP indeed has protease activity in vitro.
We characterized the calcineurin (CaN) gene family, including the subunits CaNA and CaNB, based upon sequence information obtained from the Paramecium genome project. Paramecium tetraurelia has seven subfamilies of the catalytic CaNA subunit and one subfamily of the regulatory CaNB subunit, with each subfamily having two members of considerable identity on the amino acid level (≥55% between subfamilies, ≥94% within CaNA subfamilies, and full identity in the CaNB subfamily). Within CaNA subfamily members, the catalytic domain and the CaNB binding region are highly conserved and molecular modeling revealed a three-dimensional structure almost identical to a human ortholog. At 14 members, the size of the CaNA family is unprecedented, and we hypothesized that the different CaNA subfamily members were not strictly redundant and that at least some fulfill different roles in the cell. This was tested by selecting two phylogenetically distinct members of this large family for posttranscriptional silencing by RNA interference. The two targets resulted in differing effects in exocytosis, calcium dynamics, and backward swimming behavior that supported our hypothesis that the large, highly conserved CaNA family members are not strictly redundant and that at least two members have evolved diverse but overlapping functions. In sum, the occurrence of CaN in Paramecium spp., although disputed in the past, has been established on a molecular level. Its role in exocytosis and ciliary beat regulation in a protozoan, as well as in more complex organisms, suggests that these roles for CaN were acquired early in the evolution of this protein family.
Diacylglycerol acyltransferase families (DGATs) catalyze the final and rate-limiting step of triacylglycerol (TAG) biosynthesis in eukaryotic organisms. Understanding the roles of DGATs will help to create transgenic plants with value-added properties and provide clues for therapeutic intervention for obesity and related diseases. The objective of this analysis was to identify conserved sequence motifs and amino acid residues for better understanding of the structure-function relationship of these important enzymes.
117 DGAT sequences from 70 organisms including plants, animals, fungi and human are obtained from database search using tung tree DGATs. Phylogenetic analysis separates these proteins into DGAT1 and DGAT2 subfamilies. These DGATs are integral membrane proteins with more than 40% of the total amino acid residues being hydrophobic. They have similar properties and amino acid composition except that DGAT1s are approximately 20 kDa larger than DGAT2s. DGAT1s and DGAT2s have 41 and 16 completely conserved amino acid residues, respectively, although only two of them are shared by all DGATs. These residues are distributed in 7 and 6 sequence blocks for DGAT1s and DGAT2s, respectively, and located at the carboxyl termini, suggesting the location of the catalytic domains. These conserved sequence blocks do not contain the putative neutral lipid-binding domain, mitochondrial targeting signal, or ER retrieval motif. The importance of conserved residues has been demonstrated by site-directed and natural mutants.
This study has identified conserved sequence motifs and amino acid residues in all 117 DGATs and the two subfamilies. None of the completely conserved residues in DGAT1s and DGAT2s is present in recently reported isoforms in the multiple sequences alignment, raising an important question how proteins with completely different amino acid sequences could perform the same biochemical reaction. The sequence analysis should facilitate studying the structure-function relationship of DGATs with the ultimate goal to identify critical amino acid residues for engineering superb enzymes in metabolic engineering and selecting enzyme inhibitors in therapeutic application for obesity and related diseases.
Serine proteases are an abundant class of enzymes that are involved in a wide range of physiological processes and are classified
into clans sharing structural homology. The active site of the subtilisin-like clan contains a catalytic triad in the order Asp, His, Ser
(S8 family) or a catalytic tetrad in the order Glu, Asp and Ser (S53 family). The core structure and active site geometry of these
proteases is of interest for many applications. The aim of this study was to investigate the structural properties of different S8
family serine proteases from a diverse range of taxa using molecular modeling techniques. In conjunction with 12 experimentally
determined three-dimensional structures of S8 family members, our predicted structures from an archaeon, protozoan and a plant
were used for analysis of the catalytic core. Amino acid sequences were obtained from the MEROPS database and submitted to the
LOOPP server for threading based structure prediction. The predicted structures were refined and validated using PROCHECK,
SCRWL and MODELYN. Investigation of secondary structures and electrostatic surface potential was performed using MOLMOL.
Encompassing a wide range of taxa, our structural analysis provides an evolutionary perspective on S8 family serine proteases.
Focusing on the common core containing the catalytic site of the enzyme, the analysis presented here is beneficial for future
molecular modeling strategies and structure-based rational drug design.
serine protease; SB clan; S8 family; homology; threading; modeling
Rhomboid proteases reside within cellular membranes, but the advantage of this unusual environment is unclear. We discovered membrane immersion allows substrates to be identified in a fundamentally-different way, based initially upon exposing ‘masked’ conformational dynamics of transmembrane segments rather than sequence-specific binding. EPR and CD spectroscopy revealed that the membrane restrains rhomboid gate and substrate conformation to limit proteolysis. True substrates evolved intrinsically-unstable transmembrane helices that both become unstructured when not supported by the membrane, and facilitate partitioning into the hydrophilic, active-site environment. Accordingly, manipulating substrate and gate dynamics in living cells shifted cleavage sites in a manner incompatible with extended sequence binding, but correlated with a membrane-and-helix-exit propensity scale. Moreover, cleavage of diverse non-substrates was provoked by single-residue changes that destabilize transmembrane helices. Membrane immersion thus bestows rhomboid proteases with the ability to identify substrates primarily based on reading their intrinsic transmembrane dynamics.
Proteases are enzymes that break the peptide bonds that hold proteins together, and have a central role in many physiological processes, including digestion, blood clotting and programmed cell death. An important characteristic of proteases is that they are highly selective, only cutting proteins that contain well-defined sequences of amino acids in accessible regions. Proteases that are soluble in water have been studied for over a century and are now well understood, as are proteases that need to be tethered to the membrane of a cell to work properly.
In 1997 researchers discovered a protease that was immersed in the cell membrane, and it soon became clear that these intramembrane proteases were widespread and involved in a wide range of processes in cells. Examples of intramembrane proteases include γ-secretase, which is implicated in Alzheimer's disease, and various site-2 proteases that regulate pathogenic circuits in bacteria.
There are many similarities between soluble and intramembrane proteases. However, given that intramembrane proteases evolved within the hydrophobic environment of the membrane, whereas soluble proteases evolved in an aqueous environment, there should there should also be significant differences between them. The best understood intramembrane proteases in terms of their biochemistry are probably the rhomboid proteases. However, most studies of their function have been performed in detergent systems rather than in real membranes.
Moin and Urban now report that the main strategy used by rhomboid proteases to identity the proteins that they selectively cut is completely different from that used by soluble proteases. Through a combination of biochemical and spectroscopic methods, they have discovered that rhomboid proteases identify the proteins they act on mainly by detecting changes in dynamic behavior: only those proteins that lose a stable helical structure when they exit the lipid phase to interact with the rhomboid protease will be cut by the rhomboid protease. Soluble proteases, on the other hand, achieve specificity by looking for proteins with a particular sequence of amino acids. The novel strategy used by rhomboid proteases allows them to patrol the membrane for unstable helices and selectively cut them. This discovery provides the first explanation of why these complicated enzymes evolved to have active sites immersed within the cell membrane.
intramembrane proteolysis; rhomboid protease; pathogen; D. melanogaster; E. coli; Human
GB virus B (GBV-B) is a recently discovered virus responsible for hepatitis in tamarins (Saguinus species). GBV-B belongs to the Flaviviridae family and is closely related to the human pathogen hepatitis C virus (HCV). Nonstructural protein 3 (NS3) of HCV has been shown to encompass a serine protease domain required for viral maturation. GBV-B and HCV share only about 30% of the amino acid sequence within the NS3 protease domain. The catalytic triad is conserved, and the residue Phe-154, presumed to be a crucial amino acid for determining the S1 specificity pocket of the HCV NS3 protease, is also conserved. We have expressed a synthetic gene encoding the GBV-B NS3 protease domain in Escherichia coli and have characterized the purified recombinant protein for its activity on HCV substrates. We have shown that the NS3 region of the GBV-B genome actually encodes a serine protease that, despite the low sequence homology, shares substrate specificity with the HCV NS3 protease.
Blood flukes (Schistosoma spp.) are parasites that can survive for years or decades in the vasculature of permissive mammalian hosts, including humans. Proteolytic enzymes (proteases) are crucial for successful parasitism, including aspects of invasion, maturation and reproduction. Most attention has focused on the ‘cercarial elastase’ serine proteases that facilitate skin invasion by infective schistosome larvae, and the cysteine and aspartic proteases that worms use to digest the blood meal. Apart from the cercarial elastases, information regarding other S. mansoni serine proteases (SmSPs) is limited. To address this, we investigated SmSPs using genomic, transcriptomic, phylogenetic and functional proteomic approaches.
Genes encoding five distinct SmSPs, termed SmSP1 - SmSP5, some of which comprise disparate protein domains, were retrieved from the S. mansoni genome database and annotated. Reverse transcription quantitative PCR (RT- qPCR) in various schistosome developmental stages indicated complex expression patterns for SmSPs, including their constituent protein domains. SmSP2 stood apart as being massively expressed in schistosomula and adult stages. Phylogenetic analysis segregated SmSPs into diverse clusters of family S1 proteases. SmSP1 to SmSP4 are trypsin-like proteases, whereas SmSP5 is chymotrypsin-like. In agreement, trypsin-like activities were shown to predominate in eggs, schistosomula and adults using peptidyl fluorogenic substrates. SmSP5 is particularly novel in the phylogenetics of family S1 schistosome proteases, as it is part of a cluster of sequences that fill a gap between the highly divergent cercarial elastases and other family S1 proteases.
Our series of post-genomics analyses clarifies the complexity of schistosome family S1 serine proteases and highlights their interrelationships, including the cercarial elastases and, not least, the identification of a ‘missing-link’ protease cluster, represented by SmSP5. A framework is now in place to guide the characterization of individual proteases, their stage-specific expression and their contributions to parasitism, in particular, their possible modulation of host physiology.
Schistosomes are blood flukes that live in the blood system and cause chronic and debilitating infection in hundreds of millions of people. Proteolytic enzymes (proteases) produced by the parasite allow it to survive and reproduce. We focused on understanding the repertoire of trypsin- and chymotrypsin-like Schistosoma mansoni serine proteases (SmSPs) using a variety of genomic, bioinformatics, RNA- and protein-based techniques. We identified five SmSPs that are produced at different stages of the parasite's development. Based on bioinformatics and cleavage preferences for small peptide substrates, SmSP1 to SmSP4 are trypsin-like, whereas SmSP5 is chymotrypsin-like. Interestingly, SmSP5 forms part of a ‘missing link’ group of enzymes between the specialized chymotrypsin-like ‘cercarial elastases’ that help the parasite invade human skin and the more typical chymotrypsins and trypsins found in the nature. Our findings form a basis for further exploration of the functions of the individual enzymes, including their possible contributions to influencing host physiology.
The SPFH protein superfamily is a diverse family of proteins whose eukaryotic members are involved in the scaffolding of detergent-resistant microdomains. Recently the origin of the SPFH proteins has been questioned. Instead, convergent evolution has been proposed. However, an independent, convergent evolution of three large prokaryotic and three eukaryotic families is highly unlikely, especially when other mechanisms such as lateral gene transfer which could also explain their distribution pattern have not yet been considered.
To gain better insight into this very diverse protein family, we have analyzed the genomes of 497 microorganisms and investigated the pattern of occurrence as well as the genomic vicinity of the prokaryotic SPFH members.
According to sequence and operon structure, a clear division into 12 subfamilies was evident. Three subfamilies (SPFH1, SPFH2 and SPFH5) show a conserved operon structure and two additional subfamilies are linked to those three through functional aspects (SPFH1, SPFH3, SPFH4: interaction with FtsH protease). Therefore these subgroups most likely share common ancestry. The complex pattern of occurrence among the different phyla is indicative of lateral gene transfer. Organisms that do not possess a single SPFH protein are almost exclusively endosymbionts or endoparasites.
The conserved operon structure and functional similarities suggest that at least 5 subfamilies that encompass almost 75% of all prokaryotic SPFH members share a common origin. Their similarity to the different eukaryotic SPFH families, as well as functional similarities, suggests that the eukaryotic SPFH families originated from different prokaryotic SPFH families rather than one. This explains the difficulties in obtaining a consistent phylogenetic tree of the eukaryotic SPFH members. Phylogenetic evidence points towards lateral gene transfer as one source of the very diverse patterns of occurrence in bacterial species.
The tautomerase superfamily consists of structurally homologous proteins that are characterized by a β–α–β fold and a catalytic amino-terminal proline. 4-Oxalocrotonate tautomerase (4-OT) family members have been identified and categorized into five subfamilies on the basis of multiple sequence alignments and the conservation of key catalytic and structural residues. Representative members from two subfamilies have been cloned, expressed, purified, and subjected to kinetic and structural characterization. The crystal structure of DmpI from Helicobacter pylori (HpDmpI), a 4-OT homologue in subfamily 3, has been determined to high resolution (1.8 Å and 2.1 Å) in two different space groups. HpDmpI is a homohexamer with an active site cavity that includes Pro-1, but lacks the equivalent of Arg-11 and Arg-39 found in 4-OT. Instead, the side chain of Lys-36 replaces that of Arg-11 in a manner similar to that observed in the trimeric macrophage migration inhibitory factor (MIF), which is the title protein of another family in the superfamily. The electrostatic surface of the active site is also quite different and suggests that HpDmpI might prefer small, monoacid substrates. A kinetic analysis of the enzyme is consistent with the structural analysis, but a biological role for the enzyme remains elusive. The crystal structure of DmpI from Archaeoglobus fulgidus (AfDmpI), a 4-OT homologue in subfamily-4, has been determined to 2.4 Å resolution. AfDmpI is also a homohexamer, with a proposed active site cavity that includes Pro-1, but lacks any other residues that are readily identified as catalytic ones related to 4-OT activity. Indeed, the electrostatic potential of the active site differs significantly in that it is mostly neutral, in contrast to the usual electropositive features found in other 4-OT family members, suggesting that AfDmpI might accommodate hydrophobic substrates. A kinetic analysis has been carried out, but does not provide any clues about the type of reaction the enzyme might catalyze.
4-oxalocrotonate tautomerase; catalytic proline; hexamer
The mechanism of intra-protein communication and allosteric coupling is key to understanding the structure-property relationship of protein function. For subtilisin Carlsberg, the Ca2+-binding loop is distal to substrate-binding and active sites, yet the serine protease function depends on Ca2+ binding. The atomic molecular dynamics (MD) simulations of apo and Ca2+-bound subtilisin show similar structures and there is no direct evidence that subtilisin has alternative conformations. To model the intra-protein communication due to Ca2+ binding, we transform the sequential segments of an atomic MD trajectory into separate elastic network models to represent anharmonicity and nonlinearity effectively as the temporal and spatial variation of the mechanical coupling network. In analogy to the spectrogram of sound waves, this transformation is termed the “fluctuogram” of protein dynamics. We illustrate that the Ca2+-bound and apo states of subtilisin have different fluctuograms and that intra-protein communication proceeds intermittently both in space and in time. We found that residues with large mechanical coupling variation due to Ca2+ binding correlate with the reported mutation sites selected by directed evolution for improving the stability of subtilisin and its activity in a non-aqueous environment. Furthermore, we utilize the fluctuograms calculated from MD to capture the highly correlated residues in a multiple sequence alignment. We show that in addition to the magnitude, the variance of coupling strength is also an indicative property for the sequence correlation observed in a statistical coupling analysis. The results of this work illustrate that the mechanical coupling networks calculated from atomic details can be used to correlate with functionally important mutation sites and co-evolution.
A hallmark of protein molecules is their machine-like behaviors while carrying out biological functions. At the molecular level, molecular signals such as binding a metal ion at an action site can cause long-range effects and alter protein function. Such phenomena are often referred to as intra-protein communication or allosteric coupling. Elucidating the underlying mechanisms could lead to novel discovery of molecular modulators to regulate protein function in a more specific and effective manner. A long-standing puzzle is the roles of the anharmonicity and nonlinearity in protein dynamics. To incorporate these characters in modeling intra-protein communication, we devise a “fluctuogram” analysis to record the choreography of allosteric coupling in an atomic molecular dynamics simulation. We show that fluctuogram analysis can bridge the results of physics-based simulation and sequence alignment in bioinformatics by capturing the residues that exhibit high correlation in a multiple sequence alignment. We also show that the fluctuograms calculated from atomic details have the potential to be applied as a tool to select mutation sites for modulating protein function.