Search tips
Search criteria

Results 1-25 (1321662)

Clipboard (0)

Related Articles

1.  Analysis of binding properties and specificity through identification of the interface forming residues (IFR) for serine proteases in silico docked to different inhibitors 
Enzymes belonging to the same super family of proteins in general operate on variety of substrates and are inhibited by wide selection of inhibitors. In this work our main objective was to expand the scope of studies that consider only the catalytic and binding pocket amino acids while analyzing enzyme specificity and instead, include a wider category which we have named the Interface Forming Residues (IFR). We were motivated to identify those amino acids with decreased accessibility to solvent after docking of different types of inhibitors to sub classes of serine proteases and then create a table (matrix) of all amino acid positions at the interface as well as their respective occupancies. Our goal is to establish a platform for analysis of the relationship between IFR characteristics and binding properties/specificity for bi-molecular complexes.
We propose a novel method for describing binding properties and delineating serine proteases specificity by compiling an exhaustive table of interface forming residues (IFR) for serine proteases and their inhibitors. Currently, the Protein Data Bank (PDB) does not contain all the data that our analysis would require. Therefore, an in silico approach was designed for building corresponding complexes
The IFRs are obtained by "rigid body docking" among 70 structurally aligned, sequence wise non-redundant, serine protease structures with 3 inhibitors: bovine pancreatic trypsin inhibitor (BPTI), ecotine and ovomucoid third domain inhibitor. The table (matrix) of all amino acid positions at the interface and their respective occupancy is created. We also developed a new computational protocol for predicting IFRs for those complexes which were not deciphered experimentally so far, achieving accuracy of at least 0.97.
The serine proteases interfaces prefer polar (including glycine) residues (with some exceptions). Charged residues were found to be uniquely prevalent at the interfaces between the "miscellaneous-virus" subfamily and the three inhibitors. This prompts speculation about how important this difference in IFR characteristics is for maintaining virulence of those organisms.
Our work here provides a unique tool for both structure/function relationship analysis as well as a compilation of indicators detailing how the specificity of various serine proteases may have been achieved and/or could be altered. It also indicates that the interface forming residues which also determine specificity of serine protease subfamily can not be presented in a canonical way but rather as a matrix of alternative populations of amino acids occupying variety of IFR positions.
PMCID: PMC2974730  PMID: 20961427
2.  Modeling and structural analysis of PA clan serine proteases 
BMC Research Notes  2012;5:256.
Serine proteases account for over a third of all known proteolytic enzymes; they are involved in a variety of physiological processes and are classified into clans sharing structural homology. The PA clan of endopeptidases is the most abundant and over two thirds of this clan is comprised of the S1 family of serine proteases, which bear the archetypal trypsin fold and have a catalytic triad in the order Histidine, Aspartate, Serine. These proteases have been studied in depth and many three dimensional structures have been experimentally determined. However, these structures mostly consist of bacterial and animal proteases, with a small number of plant and fungal proteases and as yet no structures have been determined for protozoa or archaea. The core structure and active site geometry of these proteases is of interest for many applications. This study investigated the structural properties of different S1 family serine proteases from a diverse range of taxa using molecular modeling techniques.
Our predicted models from protozoa, archaea, fungi and plants were combined with the experimentally determined structures of 16 S1 family members and used for analysis of the catalytic core. Amino acid sequences were submitted to SWISS-MODEL for homology-based structure prediction or the LOOPP server for threading-based structure prediction. Predicted models were refined using INSIGHT II and SCRWL and validated against experimental structures. Investigation of secondary structures and electrostatic surface potential was performed using MOLMOL. The structural geometry of the catalytic core shows clear deviations between taxa, but the relative positions of the catalytic triad residues were conserved. Some highly conserved residues potentially contributing to the stability of the structural core were identified. Evolutionary divergence was also exhibited by large variation in secondary structure features outside the core, differences in overall amino acid distribution, and unique surface electrostatic potential patterns between species.
Encompassing a wide range of taxa, our structural analysis provides an evolutionary perspective on S1 family serine proteases. Focusing on the common core containing the catalytic site of the enzyme, this analysis is beneficial for future molecular modeling strategies and structural analysis of serine protease models.
PMCID: PMC3434108  PMID: 22624962
Serine protease; PA clan; Homology; Threading; Modeling
3.  Intramolecular Interactions between the Protease and Structural Domains Are Important for the Functions of Serine Protease Autotransporters▿ †  
Infection and Immunity  2010;78(8):3335-3345.
Autotransporter (AT) is a protein secretion pathway found in Gram-negative bacteria featuring a multidomain polypeptide with a signal sequence, a passenger domain, and a translocator domain. An AT subfamily named serine protease ATs of the family Enterobacteriaceae (SPATEs) is characterized by the presence of a conserved serine protease motif in the passenger domain which contributes to bacterial pathogenesis. The goal of the current study is to determine the importance of the passenger domain conserved residues in the SPATE proteolytic and adhesive functions using the temperature-sensitive hemagglutinin (Tsh) protein as our model. To begin, mutations of 21 fully conserved residues in the four passenger domain conserved motifs were constructed by PCR-based site-directed mutagenesis. Seventeen mutants exhibited a wild-type secretion level; among these mutants, eight displayed reduced proteolytic activities in Tsh-specific oligopeptide and mucin cleavage assays. These eight mutants also demonstrated lower affinities to extracellular matrix proteins, collagen IV, and fibronectin. These eight conserved residues were analyzed by molecular graphics modeling to demonstrate their intramolecular interactions with the catalytic triad and other key residues. Additional mutations were made to confirm the above interactions in order to demonstrate their significance to the SPATE functions. Altogether our data suggest that certain conserved residues in the SPATE passenger domain are important for both the proteolytic and adhesive activities of SPATE by maintaining the proper protein structure via intramolecular interactions between the protease and β-helical domains. Here, we provide new insight into the structure-function relationship of the SPATEs and the functional roles of their conserved residues.
PMCID: PMC2916258  PMID: 20479079
4.  WildSpan: mining structured motifs from protein sequences 
Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards) are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions) that incorporates several pruning strategies to largely reduce the mining cost.
WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode of WildSpan is developed for discovering functional regions of a single protein by referring to a set of related sequences (e.g. its homologues). The discovered W-patterns are used to characterize the protein sequence and the results are compared with the conserved positions identified by multiple sequence alignment (MSA). The family-based mining mode of WildSpan is developed for extracting sequence signatures for a group of related proteins (e.g. a protein family) for protein function classification. In this situation, the discovered W-patterns are compared with PROSITE patterns as well as the patterns generated by three existing methods performing the similar task. Finally, analysis on execution time of running WildSpan reveals that the proposed pruning strategy is effective in improving the scalability of the proposed algorithm.
The mining results conducted in this study reveal that WildSpan is efficient and effective in discovering functional signatures of proteins directly from sequences. The proposed pruning strategy is effective in improving the scalability of WildSpan. It is demonstrated in this study that the W-patterns discovered by WildSpan provides useful information in characterizing protein sequences. The WildSpan executable and open source codes are available on the web (
PMCID: PMC3082213  PMID: 21453542
5.  Inferring Hypotheses on Functional Relationships of Genes: Analysis of the Arabidopsis thaliana Subtilase Gene Family 
PLoS Computational Biology  2005;1(4):e40.
The gene family of subtilisin-like serine proteases (subtilases) in Arabidopsis thaliana comprises 56 members, divided into six distinct subfamilies. Whereas the members of five subfamilies are similar to pyrolysins, two genes share stronger similarity to animal kexins. Mutant screens confirmed 144 T-DNA insertion lines with knockouts for 55 out of the 56 subtilases. Apart from SDD1, none of the confirmed homozygous mutants revealed any obvious visible phenotypic alteration during growth under standard conditions. Apart from this specific case, forward genetics gave us no hints about the function of the individual 54 non-characterized subtilase genes. Therefore, the main objective of our work was to overcome the shortcomings of the forward genetic approach and to infer alternative experimental approaches by using an integrative bioinformatics and biological approach. Computational analyses based on transcriptional co-expression and co-response pattern revealed at least two expression networks, suggesting that functional redundancy may exist among subtilases with limited similarity. Furthermore, two hubs were identified, which may be involved in signalling or may represent higher-order regulatory factors involved in responses to environmental cues. A particular enrichment of co-regulated genes with metabolic functions was observed for four subtilases possibly representing late responsive elements of environmental stress. The kexin homologs show stronger associations with genes of transcriptional regulation context. Based on the analyses presented here and in accordance with previously characterized subtilases, we propose three main functions of subtilases: involvement in (i) control of development, (ii) protein turnover, and (iii) action as downstream components of signalling cascades. Supplemental material is available in the Plant Subtilase Database (PSDB) ( , as well as from the CSB.DB (
The first complete plant genome sequence was available for Arabidopsis thaliana, a common weed. The number of genes in the Arabidopsis genome is estimated to be around 25,000. The functions of most of these gene are, however, still unknown. Many genes are grouped into gene families due to conserved sequences and predicted protein structures. In this article, the large subtilisin-like serine protease (subtilase) family of Arabidopsis is analysed. Although 56 subtilase genes have been identified in Arabidopsis, the function of only two subtilases is known. Analysis of mutants has revealed no further hints about the function of the other 54 subtilases. Here the authors present a novel approach to infer hypotheses about functions of the subtilase genes using computational analysis. Based on the analyses presented here and in accordance with previously characterized subtilases, they propose three main functions of subtilases: involvement in (i) control of development, (ii) protein degradation, and (iii) signalling. The results presented can be used to direct further analysis to elucidate functions of subtilases in plants.
PMCID: PMC1236819  PMID: 16193095
6.  SjAPI, the First Functionally Characterized Ascaris-Type Protease Inhibitor from Animal Venoms 
PLoS ONE  2013;8(3):e57529.
Serine protease inhibitors act as modulators of serine proteases, playing important roles in protecting animal toxin peptides from degradation. However, all known serine protease inhibitors discovered thus far from animal venom belong to the Kunitz-type subfamily, and whether there are other novel types of protease inhibitors in animal venom remains unclear.
Principal Findings
Here, by screening scorpion venom gland cDNA libraries, we identified the first Ascaris-type animal toxin family, which contains four members: Scorpiops jendeki Ascaris-type protease inhibitor (SjAPI), Scorpiops jendeki Ascaris-type protease inhibitor 2 (SjAPI-2), Chaerilus tricostatus Ascaris-type protease inhibitor (CtAPI), and Buthus martensii Ascaris-type protease inhibitor (BmAPI). The detailed characterization of Ascaris-type peptide SjAPI from the venom gland of scorpion Scorpiops jendeki was carried out. The mature peptide of SjAPI contains 64 residues and possesses a classical Ascaris-type cysteine framework reticulated by five disulfide bridges, different from all known protease inhibitors from venomous animals. Enzyme and inhibitor reaction kinetics experiments showed that recombinant SjAPI was a dual function peptide with α-chymotrypsin- and elastase-inhibiting properties. Recombinant SjAPI inhibited α-chymotrypsin with a Ki of 97.1 nM and elastase with a Ki of 3.7 μM, respectively. Bioinformatics analyses and chimera experiments indicated that SjAPI contained the unique short side chain functional residues “AAV” and might be a useful template to produce new serine protease inhibitors.
To our knowledge, SjAPI is the first functionally characterized animal toxin peptide with an Ascaris-type fold. The structural and functional diversity of animal toxins with protease-inhibiting properties suggested that bioactive peptides from animal venom glands might be a new source of protease inhibitors, which will accelerate the development of diagnostic and therapeutic agents for human diseases that target diverse proteases.
PMCID: PMC3606364  PMID: 23533574
7.  Mechanisms of Intramolecular Communication in a Hyperthermophilic Acylaminoacyl Peptidase: A Molecular Dynamics Investigation 
PLoS ONE  2012;7(4):e35686.
Protein dynamics and the underlying networks of intramolecular interactions and communicating residues within the three-dimensional (3D) structure are known to influence protein function and stability, as well as to modulate conformational changes and allostery. Acylaminoacyl peptidase (AAP) subfamily of enzymes belongs to a unique class of serine proteases, the prolyl oligopeptidase (POP) family, which has not been thoroughly investigated yet. POPs have a characteristic multidomain three-dimensional architecture with the active site at the interface of the C-terminal catalytic domain and a β-propeller domain, whose N-terminal region acts as a bridge to the hydrolase domain. In the present contribution, protein dynamics signatures of a hyperthermophilic acylaminoacyl peptidase (AAP) of the prolyl oligopeptidase (POP) family, as well as of a deletion variant and alanine mutants (I12A, V13A, V16A, L19A, I20A) are reported. In particular, we aimed at identifying crucial residues for long range communications to the catalytic site or promoting the conformational changes to switch from closed to open ApAAP conformations. Our investigation shows that the N-terminal α1-helix mediates structural intramolecular communication to the catalytic site, concurring to the maintenance of a proper functional architecture of the catalytic triad. Main determinants of the effects induced by α1-helix are a subset of hydrophobic residues (V16, L19 and I20). Moreover, a subset of residues characterized by relevant interaction networks or coupled motions have been identified, which are likely to modulate the conformational properties at the interdomain interface.
PMCID: PMC3338720  PMID: 22558199
8.  Automated Protein Subfamily Identification and Classification 
PLoS Computational Biology  2007;3(8):e160.
Function prediction by homology is widely used to provide preliminary functional annotations for genes for which experimental evidence of function is unavailable or limited. This approach has been shown to be prone to systematic error, including percolation of annotation errors through sequence databases. Phylogenomic analysis avoids these errors in function prediction but has been difficult to automate for high-throughput application. To address this limitation, we present a computationally efficient pipeline for phylogenomic classification of proteins. This pipeline uses the SCI-PHY (Subfamily Classification in Phylogenomics) algorithm for automatic subfamily identification, followed by subfamily hidden Markov model (HMM) construction. A simple and computationally efficient scoring scheme using family and subfamily HMMs enables classification of novel sequences to protein families and subfamilies. Sequences representing entirely novel subfamilies are differentiated from those that can be classified to subfamilies in the input training set using logistic regression. Subfamily HMM parameters are estimated using an information-sharing protocol, enabling subfamilies containing even a single sequence to benefit from conservation patterns defining the family as a whole or in related subfamilies. SCI-PHY subfamilies correspond closely to functional subtypes defined by experts and to conserved clades found by phylogenetic analysis. Extensive comparisons of subfamily and family HMM performances show that subfamily HMMs dramatically improve the separation between homologous and non-homologous proteins in sequence database searches. Subfamily HMMs also provide extremely high specificity of classification and can be used to predict entirely novel subtypes. The SCI-PHY Web server at allows users to upload a multiple sequence alignment for subfamily identification and subfamily HMM construction. Biologists wishing to provide their own subfamily definitions can do so. Source code is available on the Web page. The Berkeley Phylogenomics Group PhyloFacts resource contains pre-calculated subfamily predictions and subfamily HMMs for more than 40,000 protein families and domains at
Author Summary
Predicting the function of a gene or protein (gene product) from its primary sequence is a major focus of many bioinformatics methods. In this paper, the authors present a three-stage computational pipeline for gene functional annotation in an evolutionary framework to reduce the systematic errors associated with the standard protocol (annotation transfer from predicted homologs). In the first stage, a functional hierarchy is estimated for each protein family and subfamilies are identified. In the second stage, hidden Markov models (HMMs) (a type of statistical model) are constructed for each subfamily to model both the family-defining and subfamily-specific signatures. In the third stage, subfamily HMMs are used to assign novel sequences to functional subtypes. Extensive experimental validation of these methods shows that predicted subfamilies correspond closely to functional subtypes identified by experts and to conserved clades in phylogenetic trees; that subfamily HMMs increase the separation between homologs and non-homologs in sequence database discrimination tests relative to the use of a single HMM for the family; and that specificity of classification of novel sequences to subfamilies using subfamily HMMs is near perfect (1.5% error rate when sequences are assigned to the top-scoring subfamily, and <0.5% error rate when logistic regression of scores is employed).
PMCID: PMC1950344  PMID: 17708678
9.  Purification, Characterization, and Functional Role of a Novel Extracellular Protease from Pleurotus ostreatus 
A new extracellular protease (PoSl; Pleurotus ostreatus subtilisin-like protease) from P. ostreatus culture broth has been purified and characterized. PoSl is a monomeric glycoprotein with a molecular mass of 75 kDa, a pI of 4.5, and an optimum pH in the alkaline range. The inhibitory profile indicates that PoSl is a serine protease. The N-terminal and three tryptic peptide sequences of PoSl have been determined. The homology of one internal peptide with conserved sequence around the Asp residue of the catalytic triad in the subtilase family suggests that PoSl is a subtilisin-like protease. This hypothesis is further supported by the finding that PoSl hydrolysis sites of the insulin B chain match those of subtilisin. PoSl activity is positively affected by calcium. A 10-fold decrease in the Km value in the presence of calcium ions can reflect an induced structural change in the substrate recognition site region. Furthermore, Ca2+ binding slows PoSl autolysis, triggering the protein to form a more compact structure. These effects have already been observed for subtilisin and other serine proteases. Moreover, PoSl protease seems to play a key role in the regulation of P. ostreatus laccase activity by degrading and/or activating different isoenzymes.
PMCID: PMC92935  PMID: 11375191
10.  Chikungunya nsP2 protease is not a papain-like cysteine protease and the catalytic dyad cysteine is interchangeable with a proximal serine 
Scientific Reports  2015;5:17125.
Chikungunya virus is the pathogenic alphavirus that causes chikungunya fever in humans. In the last decade millions of cases have been reported around the world from Africa to Asia to the Americas. The alphavirus nsP2 protein is multifunctional and is considered to be pivotal to viral replication, as the nsP2 protease activity is critical for proteolytic processing of the viral polyprotein during replication. Classically the alphavirus nsP2 protease is thought to be papain-like with the enzyme reaction proceeding through a cysteine/histidine catalytic dyad. We performed structure-function studies on the chikungunya nsP2 protease and show that the enzyme is not papain-like. Characterization of the catalytic dyad cysteine residue enabled us to identify a nearby serine that is catalytically interchangeable with the dyad cysteine residue. The enzyme retains activity upon alanine replacement of either residue but a replacement of both cysteine and serine residues results in no detectable activity. Protein dynamics appears to allow the use of either the cysteine or the serine residue in catalysis. This switchable dyad residue has not been previously reported for alphavirus nsP2 proteases and would have a major impact on the nsP2 protease as an anti-viral target.
PMCID: PMC4657084  PMID: 26597768
11.  Cold-Active Serine Alkaline Protease from the Psychrotrophic Bacterium Shewanella Strain Ac10: Gene Cloning and Enzyme Purification and Characterization 
The gene encoding serine alkaline protease (SapSh) of the psychrotrophic bacterium Shewanella strain Ac10 was cloned in Escherichia coli. The amino acid sequence deduced from the 2,442-bp nucleotide sequence revealed that the protein was 814 amino acids long and had an estimated molecular weight of 85,113. SapSh exhibited sequence similarities with members of the subtilisin family of proteases, and there was a high level of conservation in the regions around a putative catalytic triad consisting of Asp-30, His-65, and Ser-369. The amino acid sequence contained the following regions which were assigned on the basis of homology to previously described sequences: a signal peptide (26 residues), a propeptide (117 residues), and an extension up to the C terminus (about 250 residues). Another feature of SapSh is the fact that the space between His-65 and Ser-369 is approximately 150 residues longer than the corresponding spaces in other proteases belonging to the subtilisin family. SapSh was purified to homogeneity from the culture supernatant of E. coli recombinant cells by affinity chromatography with a bacitracin-Sepharose column. The recombinant SapSh (rSapSh) was found to have a molecular weight of about 44,000 and to be highly active in the alkaline region (optimum pH, around 9.0) when azocasein and synthetic peptides were used as substrates. rSapSh was characterized by its high levels of activity at low temperatures; it was five times more active than subtilisin Carlsberg at temperatures ranging from 5 to 15°C. The activation energy for hydrolysis of azocasein by rSapSh was much lower than the activation energy for hydrolysis of azocasein by the subtilisin. However, rSapSh was far less stable than the subtilisin.
PMCID: PMC91069  PMID: 9925590
12.  The Plasmodium serine-type SERA proteases display distinct expression patterns and non-essential in vivo roles during life cycle progression of the malaria parasite 
Cellular Microbiology  2010;12(6):725-739.
Parasite proteases play key roles in several fundamental steps of the Plasmodium life cycle, including haemoglobin degradation, host cell invasion and parasite egress. Plasmodium exit from infected host cells appears to be mediated by a class of papain-like cysteine proteases called ‘serine repeat antigens’ (SERAs). A SERA subfamily, represented by Plasmodium falciparum SERA5, contains an atypical active site serine residue instead of a catalytic cysteine. Members of this SERAser subfamily are abundantly expressed in asexual blood stages, rendering them attractive drug and vaccine targets. In this study, we show by antibody localization and in vivo fluorescent tagging with the red fluorescent protein mCherry that the two P. berghei serine-type family members, PbSERA1 and PbSERA2, display differential expression towards the final stages of merozoite formation. Via targeted gene replacement, we generated single and double gene knockouts of the P. berghei SERAser genes. These loss-of-function lines progressed normally through the parasite life cycle, suggesting a specialized, non-vital role for serine-type SERAs in vivo. Parasites lacking PbSERAser showed increased expression of the cysteine-type PbSERA3. Compensatory mechanisms between distinct SERA subfamilies may thus explain the absence of phenotypical defect in SERAser disruptants, and challenge the suitability to develop potent antimalarial drugs based on specific inhibitors of Plasmodium serine-type SERAs.
PMCID: PMC2878606  PMID: 20039882
13.  A Computational Module Assembled from Different Protease Family Motifs Identifies PI PLC from Bacillus cereus as a Putative Prolyl Peptidase with a Serine Protease Scaffold 
PLoS ONE  2013;8(8):e70923.
Proteolytic enzymes have evolved several mechanisms to cleave peptide bonds. These distinct types have been systematically categorized in the MEROPS database. While a BLAST search on these proteases identifies homologous proteins, sequence alignment methods often fail to identify relationships arising from convergent evolution, exon shuffling, and modular reuse of catalytic units. We have previously established a computational method to detect functions in proteins based on the spatial and electrostatic properties of the catalytic residues (CLASP). CLASP identified a promiscuous serine protease scaffold in alkaline phosphatases (AP) and a scaffold recognizing a β-lactam (imipenem) in a cold-active Vibrio AP. Subsequently, we defined a methodology to quantify promiscuous activities in a wide range of proteins. Here, we assemble a module which encapsulates the multifarious motifs used by protease families listed in the MEROPS database. Since APs and proteases are an integral component of outer membrane vesicles (OMV), we sought to query other OMV proteins, like phospholipase C (PLC), using this search module. Our analysis indicated that phosphoinositide-specific PLC from Bacillus cereus is a serine protease. This was validated by protease assays, mass spectrometry and by inhibition of the native phospholipase activity of PI-PLC by the well-known serine protease inhibitor AEBSF (IC50 = 0.018 mM). Edman degradation analysis linked the specificity of the protease activity to a proline in the amino terminal, suggesting that the PI-PLC is a prolyl peptidase. Thus, we propose a computational method of extending protein families based on the spatial and electrostatic congruence of active site residues.
PMCID: PMC3733634  PMID: 23940667
14.  Expression profiling and comparative analyses of seven midgut serine proteases from the yellow fever mosquito, Aedes aegypti 
Journal of insect physiology  2010;56(7):736-744.
Aedes aegypti utilizes blood for energy production, egg maturation and replenishment of maternal reserves. The principle midgut enzymes responsible for bloodmeal digestion are endoproteolytic serine-type proteases within the S1.A subfamily. While there are hundreds of serine protease-like genes in the A. aegypti genome, only five are known to be expressed in the midgut. We describe the cloning, sequencing and expression profiling of seven additional serine proteases and provide a genomic and phylogenetic assessment of these findings. Of the seven genes, four are constitutively expressed and three are transcriptionally induced upon blood feeding. The amount of transcriptional induction is strongly correlated among these genes. Alignments reveal that, in general, the conserved catalytic triad, active site and accessory catalytic residues are maintained in these genes and phylogenetic analysis shows that these genes fall within three distinct clades; trypsins, chymotrypsins and serine collagenases. Interestingly, a previously described trypsin consistently arose with other serine collagenases in phylogenetic analyses. These results suggest that multiple gene duplications have arisen within the S1.A subfamily of midgut serine proteases and/or that A. aegypti has evolved an array of proteases with a broad range of substrate specificities for rapid, efficient digestion of bloodmeals.
PMCID: PMC2878907  PMID: 20100490
Aedes aegypti; Midgut; Serine proteases
15.  Molecular models of NS3 protease variants of the Hepatitis C virus 
Hepatitis C virus (HCV) currently infects approximately three percent of the world population. In view of the lack of vaccines against HCV, there is an urgent need for an efficient treatment of the disease by an effective antiviral drug. Rational drug design has not been the primary way for discovering major therapeutics. Nevertheless, there are reports of success in the development of inhibitor using a structure-based approach. One of the possible targets for drug development against HCV is the NS3 protease variants. Based on the three-dimensional structure of these variants we expect to identify new NS3 protease inhibitors. In order to speed up the modeling process all NS3 protease variant models were generated in a Beowulf cluster. The potential of the structural bioinformatics for development of new antiviral drugs is discussed.
The atomic coordinates of crystallographic structure 1CU1 and 1DY9 were used as starting model for modeling of the NS3 protease variant structures. The NS3 protease variant structures are composed of six subdomains, which occur in sequence along the polypeptide chain. The protease domain exhibits the dual beta-barrel fold that is common among members of the chymotrypsin serine protease family. The helicase domain contains two structurally related beta-alpha-beta subdomains and a third subdomain of seven helices and three short beta strands. The latter domain is usually referred to as the helicase alpha-helical subdomain. The rmsd value of bond lengths and bond angles, the average G-factor and Verify 3D values are presented for NS3 protease variant structures.
This project increases the certainty that homology modeling is an useful tool in structural biology and that it can be very valuable in annotating genome sequence information and contributing to structural and functional genomics from virus. The structural models will be used to guide future efforts in the structure-based drug design of a new generation of NS3 protease variants inhibitors. All models in the database are publicly accessible via our interactive website, providing us with large amount of structural models for use in protein-ligand docking analysis.
PMCID: PMC547903  PMID: 15663787
16.  Granzymes: a family of lymphocyte granule serine proteases 
Genome Biology  2001;2(12):reviews3014.1-reviews3014.7.
Granzymes, a family of serine proteases, are expressed exclusively by cytotoxic T lymphocytes and natural killer (NK) cells, components of the immune system that protect higher organisms against viral infection and cellular transformation. Following receptor-mediated conjugate formation between a granzyme-containing cell and an infected or transformed target cell, granzymes enter the target cell via endocytosis and induce apoptosis. Granzyme B is the most powerful pro-apoptotic member of the granzyme family. Like caspases, cysteine proteases that play an important role in apoptosis, it can cleave proteins after acidic residues, especially aspartic acid. Other granzymes may serve additional functions, and some may not induce apoptosis. Granzymes have been well characterized only in human and rodents, and can be grouped into three subfamilies according to substrate specificity: members of the granzyme family that have enzymatic activity similar to the serine protease chymotrypsin are encoded by a gene cluster termed the 'chymase locus'; granzymes with trypsin-like specificities are encoded by the 'tryptase locus'; and a third subfamily cleaves after unbranched hydrophobic residues, especially methionine, and is encoded by the 'Met-ase locus'. All granzymes are synthesized as zymogens and, after clipping of the leader peptide, maximal enzymatic activity is achieved by removal of an amino-terminal dipeptide. They can all be blocked by serine protease inhibitors, and a new group of inhibitors has recently been identified - serpins, some of which are specific for granzymes. Future studies of serpins may bring insights into how cells that synthesize granzymes are protected from inadvertent cell suicide.
PMCID: PMC138995  PMID: 11790262
17.  Dimerization-Induced Allosteric Changes of the Oxyanion-Hole Loop Activate the Pseudorabies Virus Assemblin pUL26N, a Herpesvirus Serine Protease 
PLoS Pathogens  2015;11(7):e1005045.
Herpesviruses encode a characteristic serine protease with a unique fold and an active site that comprises the unusual triad Ser-His-His. The protease is essential for viral replication and as such constitutes a promising drug target. In solution, a dynamic equilibrium exists between an inactive monomeric and an active dimeric form of the enzyme, which is believed to play a key regulatory role in the orchestration of proteolysis and capsid assembly. Currently available crystal structures of herpesvirus proteases correspond either to the dimeric state or to complexes with peptide mimetics that alter the dimerization interface. In contrast, the structure of the native monomeric state has remained elusive. Here, we present the three-dimensional structures of native monomeric, active dimeric, and diisopropyl fluorophosphate-inhibited dimeric protease derived from pseudorabies virus, an alphaherpesvirus of swine. These structures, solved by X-ray crystallography to respective resolutions of 2.05, 2.10 and 2.03 Å, allow a direct comparison of the main conformational states of the protease. In the dimeric form, a functional oxyanion hole is formed by a loop of 10 amino-acid residues encompassing two consecutive arginine residues (Arg136 and Arg137); both are strictly conserved throughout the herpesviruses. In the monomeric form, the top of the loop is shifted by approximately 11 Å, resulting in a complete disruption of the oxyanion hole and loss of activity. The dimerization-induced allosteric changes described here form the physical basis for the concentration-dependent activation of the protease, which is essential for proper virus replication. Small-angle X-ray scattering experiments confirmed a concentration-dependent equilibrium of monomeric and dimeric protease in solution.
Author Summary
Herpesviruses encode a unique serine protease, which is essential for herpesvirus capsid maturation and is therefore an interesting target for drug development. In solution, this protease exists in an equilibrium of an inactive monomeric and an active dimeric form. All currently available crystal structures of herpesvirus proteases represent complexes, particularly dimers. Here we show the first three-dimensional structure of the native monomeric form in addition to the native and the chemically inactivated dimeric form of the protease derived from the porcine herpesvirus pseudorabies virus. Comparison of the monomeric and dimeric form allows predictions on the structural changes that occur during dimerization and shed light onto the process of protease activation. These new crystal structures provide a rational base to develop drugs preventing dimerization and therefore impeding herpesvirus capsid maturation. Furthermore, it is likely that this mechanism is conserved throughout the herpesviruses.
PMCID: PMC4498786  PMID: 26161660
18.  Active Site Detection by Spatial Conformity and Electrostatic Analysis—Unravelling a Proteolytic Function in Shrimp Alkaline Phosphatase 
PLoS ONE  2011;6(12):e28470.
Computational methods are increasingly gaining importance as an aid in identifying active sites. Mostly these methods tend to have structural information that supplement sequence conservation based analyses. Development of tools that compute electrostatic potentials has further improved our ability to better characterize the active site residues in proteins. We have described a computational methodology for detecting active sites based on structural and electrostatic conformity - CataLytic Active Site Prediction (CLASP). In our pipelined model, physical 3D signature of any particular enzymatic function as defined by its active sites is used to obtain spatially congruent matches. While previous work has revealed that catalytic residues have large pKa deviations from standard values, we show that for a given enzymatic activity, electrostatic potential difference (PD) between analogous residue pairs in an active site taken from different proteins of the same family are similar. False positives in spatially congruent matches are further pruned by PD analysis where cognate pairs with large deviations are rejected. We first present the results of active site prediction by CLASP for two enzymatic activities - β-lactamases and serine proteases, two of the most extensively investigated enzymes. The results of CLASP analysis on motifs extracted from Catalytic Site Atlas (CSA) are also presented in order to demonstrate its ability to accurately classify any protein, putative or otherwise, with known structure. The source code and database is made available at Subsequently, we probed alkaline phosphatases (AP), one of the well known promiscuous enzymes, for additional activities. Such a search has led us to predict a hitherto unknown function of shrimp alkaline phosphatase (SAP), where the protein acts as a protease. Finally, we present experimental evidence of the prediction by CLASP by showing that SAP indeed has protease activity in vitro.
PMCID: PMC3234256  PMID: 22174814
19.  Structure-Function Analysis of Diacylglycerol Acyltransferase Sequences from 70 Organisms 
BMC Research Notes  2011;4:249.
Diacylglycerol acyltransferase families (DGATs) catalyze the final and rate-limiting step of triacylglycerol (TAG) biosynthesis in eukaryotic organisms. Understanding the roles of DGATs will help to create transgenic plants with value-added properties and provide clues for therapeutic intervention for obesity and related diseases. The objective of this analysis was to identify conserved sequence motifs and amino acid residues for better understanding of the structure-function relationship of these important enzymes.
117 DGAT sequences from 70 organisms including plants, animals, fungi and human are obtained from database search using tung tree DGATs. Phylogenetic analysis separates these proteins into DGAT1 and DGAT2 subfamilies. These DGATs are integral membrane proteins with more than 40% of the total amino acid residues being hydrophobic. They have similar properties and amino acid composition except that DGAT1s are approximately 20 kDa larger than DGAT2s. DGAT1s and DGAT2s have 41 and 16 completely conserved amino acid residues, respectively, although only two of them are shared by all DGATs. These residues are distributed in 7 and 6 sequence blocks for DGAT1s and DGAT2s, respectively, and located at the carboxyl termini, suggesting the location of the catalytic domains. These conserved sequence blocks do not contain the putative neutral lipid-binding domain, mitochondrial targeting signal, or ER retrieval motif. The importance of conserved residues has been demonstrated by site-directed and natural mutants.
This study has identified conserved sequence motifs and amino acid residues in all 117 DGATs and the two subfamilies. None of the completely conserved residues in DGAT1s and DGAT2s is present in recently reported isoforms in the multiple sequences alignment, raising an important question how proteins with completely different amino acid sequences could perform the same biochemical reaction. The sequence analysis should facilitate studying the structure-function relationship of DGATs with the ultimate goal to identify critical amino acid residues for engineering superb enzymes in metabolic engineering and selecting enzyme inhibitors in therapeutic application for obesity and related diseases.
PMCID: PMC3157451  PMID: 21777418
20.  Membrane immersion allows rhomboid proteases to achieve specificity by reading transmembrane segment dynamics 
eLife  2012;1:e00173.
Rhomboid proteases reside within cellular membranes, but the advantage of this unusual environment is unclear. We discovered membrane immersion allows substrates to be identified in a fundamentally-different way, based initially upon exposing ‘masked’ conformational dynamics of transmembrane segments rather than sequence-specific binding. EPR and CD spectroscopy revealed that the membrane restrains rhomboid gate and substrate conformation to limit proteolysis. True substrates evolved intrinsically-unstable transmembrane helices that both become unstructured when not supported by the membrane, and facilitate partitioning into the hydrophilic, active-site environment. Accordingly, manipulating substrate and gate dynamics in living cells shifted cleavage sites in a manner incompatible with extended sequence binding, but correlated with a membrane-and-helix-exit propensity scale. Moreover, cleavage of diverse non-substrates was provoked by single-residue changes that destabilize transmembrane helices. Membrane immersion thus bestows rhomboid proteases with the ability to identify substrates primarily based on reading their intrinsic transmembrane dynamics.
eLife digest
Proteases are enzymes that break the peptide bonds that hold proteins together, and have a central role in many physiological processes, including digestion, blood clotting and programmed cell death. An important characteristic of proteases is that they are highly selective, only cutting proteins that contain well-defined sequences of amino acids in accessible regions. Proteases that are soluble in water have been studied for over a century and are now well understood, as are proteases that need to be tethered to the membrane of a cell to work properly.
In 1997 researchers discovered a protease that was immersed in the cell membrane, and it soon became clear that these intramembrane proteases were widespread and involved in a wide range of processes in cells. Examples of intramembrane proteases include γ-secretase, which is implicated in Alzheimer's disease, and various site-2 proteases that regulate pathogenic circuits in bacteria.
There are many similarities between soluble and intramembrane proteases. However, given that intramembrane proteases evolved within the hydrophobic environment of the membrane, whereas soluble proteases evolved in an aqueous environment, there should there should also be significant differences between them. The best understood intramembrane proteases in terms of their biochemistry are probably the rhomboid proteases. However, most studies of their function have been performed in detergent systems rather than in real membranes.
Moin and Urban now report that the main strategy used by rhomboid proteases to identity the proteins that they selectively cut is completely different from that used by soluble proteases. Through a combination of biochemical and spectroscopic methods, they have discovered that rhomboid proteases identify the proteins they act on mainly by detecting changes in dynamic behavior: only those proteins that lose a stable helical structure when they exit the lipid phase to interact with the rhomboid protease will be cut by the rhomboid protease. Soluble proteases, on the other hand, achieve specificity by looking for proteins with a particular sequence of amino acids. The novel strategy used by rhomboid proteases allows them to patrol the membrane for unstable helices and selectively cut them. This discovery provides the first explanation of why these complicated enzymes evolved to have active sites immersed within the cell membrane.
PMCID: PMC3494066  PMID: 23150798
intramembrane proteolysis; rhomboid protease; pathogen; D. melanogaster; E. coli; Human
21.  Modeling and structural analysis of evolutionarily diverse S8 family serine proteases 
Bioinformation  2011;7(5):239-245.
Serine proteases are an abundant class of enzymes that are involved in a wide range of physiological processes and are classified into clans sharing structural homology. The active site of the subtilisin-like clan contains a catalytic triad in the order Asp, His, Ser (S8 family) or a catalytic tetrad in the order Glu, Asp and Ser (S53 family). The core structure and active site geometry of these proteases is of interest for many applications. The aim of this study was to investigate the structural properties of different S8 family serine proteases from a diverse range of taxa using molecular modeling techniques. In conjunction with 12 experimentally determined three-dimensional structures of S8 family members, our predicted structures from an archaeon, protozoan and a plant were used for analysis of the catalytic core. Amino acid sequences were obtained from the MEROPS database and submitted to the LOOPP server for threading based structure prediction. The predicted structures were refined and validated using PROCHECK, SCRWL and MODELYN. Investigation of secondary structures and electrostatic surface potential was performed using MOLMOL. Encompassing a wide range of taxa, our structural analysis provides an evolutionary perspective on S8 family serine proteases. Focusing on the common core containing the catalytic site of the enzyme, the analysis presented here is beneficial for future molecular modeling strategies and structure-based rational drug design.
PMCID: PMC3218418  PMID: 22125392
serine protease; SB clan; S8 family; homology; threading; modeling
22.  Protein Phosphatase 2B (PP2B, Calcineurin) in Paramecium: Partial Characterization Reveals That Two Members of the Unusually Large Catalytic Subunit Family Have Distinct Roles in Calcium-Dependent Processes▿‡ 
Eukaryotic Cell  2010;9(7):1049-1063.
We characterized the calcineurin (CaN) gene family, including the subunits CaNA and CaNB, based upon sequence information obtained from the Paramecium genome project. Paramecium tetraurelia has seven subfamilies of the catalytic CaNA subunit and one subfamily of the regulatory CaNB subunit, with each subfamily having two members of considerable identity on the amino acid level (≥55% between subfamilies, ≥94% within CaNA subfamilies, and full identity in the CaNB subfamily). Within CaNA subfamily members, the catalytic domain and the CaNB binding region are highly conserved and molecular modeling revealed a three-dimensional structure almost identical to a human ortholog. At 14 members, the size of the CaNA family is unprecedented, and we hypothesized that the different CaNA subfamily members were not strictly redundant and that at least some fulfill different roles in the cell. This was tested by selecting two phylogenetically distinct members of this large family for posttranscriptional silencing by RNA interference. The two targets resulted in differing effects in exocytosis, calcium dynamics, and backward swimming behavior that supported our hypothesis that the large, highly conserved CaNA family members are not strictly redundant and that at least two members have evolved diverse but overlapping functions. In sum, the occurrence of CaN in Paramecium spp., although disputed in the past, has been established on a molecular level. Its role in exocytosis and ciliary beat regulation in a protozoan, as well as in more complex organisms, suggests that these roles for CaN were acquired early in the evolution of this protein family.
PMCID: PMC2901675  PMID: 20435698
23.  GB virus B and hepatitis C virus NS3 serine proteases share substrate specificity. 
Journal of Virology  1997;71(7):4985-4989.
GB virus B (GBV-B) is a recently discovered virus responsible for hepatitis in tamarins (Saguinus species). GBV-B belongs to the Flaviviridae family and is closely related to the human pathogen hepatitis C virus (HCV). Nonstructural protein 3 (NS3) of HCV has been shown to encompass a serine protease domain required for viral maturation. GBV-B and HCV share only about 30% of the amino acid sequence within the NS3 protease domain. The catalytic triad is conserved, and the residue Phe-154, presumed to be a crucial amino acid for determining the S1 specificity pocket of the HCV NS3 protease, is also conserved. We have expressed a synthetic gene encoding the GBV-B NS3 protease domain in Escherichia coli and have characterized the purified recombinant protein for its activity on HCV substrates. We have shown that the NS3 region of the GBV-B genome actually encodes a serine protease that, despite the low sequence homology, shares substrate specificity with the HCV NS3 protease.
PMCID: PMC191730  PMID: 9188562
24.  Trypsin- and Chymotrypsin-Like Serine Proteases in Schistosoma mansoni – ‘The Undiscovered Country’ 
Blood flukes (Schistosoma spp.) are parasites that can survive for years or decades in the vasculature of permissive mammalian hosts, including humans. Proteolytic enzymes (proteases) are crucial for successful parasitism, including aspects of invasion, maturation and reproduction. Most attention has focused on the ‘cercarial elastase’ serine proteases that facilitate skin invasion by infective schistosome larvae, and the cysteine and aspartic proteases that worms use to digest the blood meal. Apart from the cercarial elastases, information regarding other S. mansoni serine proteases (SmSPs) is limited. To address this, we investigated SmSPs using genomic, transcriptomic, phylogenetic and functional proteomic approaches.
Methodology/Principal Findings
Genes encoding five distinct SmSPs, termed SmSP1 - SmSP5, some of which comprise disparate protein domains, were retrieved from the S. mansoni genome database and annotated. Reverse transcription quantitative PCR (RT- qPCR) in various schistosome developmental stages indicated complex expression patterns for SmSPs, including their constituent protein domains. SmSP2 stood apart as being massively expressed in schistosomula and adult stages. Phylogenetic analysis segregated SmSPs into diverse clusters of family S1 proteases. SmSP1 to SmSP4 are trypsin-like proteases, whereas SmSP5 is chymotrypsin-like. In agreement, trypsin-like activities were shown to predominate in eggs, schistosomula and adults using peptidyl fluorogenic substrates. SmSP5 is particularly novel in the phylogenetics of family S1 schistosome proteases, as it is part of a cluster of sequences that fill a gap between the highly divergent cercarial elastases and other family S1 proteases.
Our series of post-genomics analyses clarifies the complexity of schistosome family S1 serine proteases and highlights their interrelationships, including the cercarial elastases and, not least, the identification of a ‘missing-link’ protease cluster, represented by SmSP5. A framework is now in place to guide the characterization of individual proteases, their stage-specific expression and their contributions to parasitism, in particular, their possible modulation of host physiology.
Author Summary
Schistosomes are blood flukes that live in the blood system and cause chronic and debilitating infection in hundreds of millions of people. Proteolytic enzymes (proteases) produced by the parasite allow it to survive and reproduce. We focused on understanding the repertoire of trypsin- and chymotrypsin-like Schistosoma mansoni serine proteases (SmSPs) using a variety of genomic, bioinformatics, RNA- and protein-based techniques. We identified five SmSPs that are produced at different stages of the parasite's development. Based on bioinformatics and cleavage preferences for small peptide substrates, SmSP1 to SmSP4 are trypsin-like, whereas SmSP5 is chymotrypsin-like. Interestingly, SmSP5 forms part of a ‘missing link’ group of enzymes between the specialized chymotrypsin-like ‘cercarial elastases’ that help the parasite invade human skin and the more typical chymotrypsins and trypsins found in the nature. Our findings form a basis for further exploration of the functions of the individual enzymes, including their possible contributions to influencing host physiology.
PMCID: PMC3967958  PMID: 24676141
25.  Evolutionary Loss of Activity in De-Ubiquitylating Enzymes of the OTU Family 
PLoS ONE  2015;10(11):e0143227.
Understanding function and specificity of de-ubiquitylating enzymes (DUBs) is a major goal of current research, since DUBs are key regulators of ubiquitylation events and have been shown to be mutated in human diseases. Most DUBs are cysteine proteases, relying on a catalytic triad of cysteine, histidine and aspartate to cleave the isopeptide bond between two ubiquitin units in a poly-ubiquitin chain. We have discovered that the two Drosophila melanogaster homologues of human OTUD4, CG3251 and Otu, contain a serine instead of a cysteine in the catalytic OTU (ovarian tumor) domain. DUBs that are serine proteases instead of cysteine- or metallo-proteases have not been described. In line with this, neither CG3251 nor Otu protein were active to cleave ubiquitin chains. Re-introduction of a cysteine in the catalytic center did not render the enzymes active, indicating that further critical features for ubiquitin binding or cleavage have been lost in these proteins. Sequence analysis of OTUD4 homologues from various other species showed that within this OTU subfamily, loss of the catalytic cysteine has occurred frequently in presumably independent events, as well as gene duplications or triplications, suggesting DUB-independent functions of OTUD4 proteins. Using an in vivo RNAi approach, we show that CG3251 might function in the regulation of Inhibitor of Apoptosis (IAP)-antagonist-induced apoptosis, presumably in a DUB-independent manner.
PMCID: PMC4654579  PMID: 26588485

Results 1-25 (1321662)