Related Articles
The gene encoding serine alkaline protease (SapSh) of the psychrotrophic bacterium Shewanella strain Ac10 was cloned in Escherichia coli. The amino acid sequence deduced from the 2,442-bp nucleotide sequence revealed that the protein was 814 amino acids long and had an estimated molecular weight of 85,113. SapSh exhibited sequence similarities with members of the subtilisin family of proteases, and there was a high level of conservation in the regions around a putative catalytic triad consisting of Asp-30, His-65, and Ser-369. The amino acid sequence contained the following regions which were assigned on the basis of homology to previously described sequences: a signal peptide (26 residues), a propeptide (117 residues), and an extension up to the C terminus (about 250 residues). Another feature of SapSh is the fact that the space between His-65 and Ser-369 is approximately 150 residues longer than the corresponding spaces in other proteases belonging to the subtilisin family. SapSh was purified to homogeneity from the culture supernatant of E. coli recombinant cells by affinity chromatography with a bacitracin-Sepharose column. The recombinant SapSh (rSapSh) was found to have a molecular weight of about 44,000 and to be highly active in the alkaline region (optimum pH, around 9.0) when azocasein and synthetic peptides were used as substrates. rSapSh was characterized by its high levels of activity at low temperatures; it was five times more active than subtilisin Carlsberg at temperatures ranging from 5 to 15°C. The activation energy for hydrolysis of azocasein by rSapSh was much lower than the activation energy for hydrolysis of azocasein by the subtilisin. However, rSapSh was far less stable than the subtilisin.
PMCID: PMC91069
PMID: 9925590
GB virus B (GBV-B) is a recently discovered virus responsible for hepatitis in tamarins (Saguinus species). GBV-B belongs to the Flaviviridae family and is closely related to the human pathogen hepatitis C virus (HCV). Nonstructural protein 3 (NS3) of HCV has been shown to encompass a serine protease domain required for viral maturation. GBV-B and HCV share only about 30% of the amino acid sequence within the NS3 protease domain. The catalytic triad is conserved, and the residue Phe-154, presumed to be a crucial amino acid for determining the S1 specificity pocket of the HCV NS3 protease, is also conserved. We have expressed a synthetic gene encoding the GBV-B NS3 protease domain in Escherichia coli and have characterized the purified recombinant protein for its activity on HCV substrates. We have shown that the NS3 region of the GBV-B genome actually encodes a serine protease that, despite the low sequence homology, shares substrate specificity with the HCV NS3 protease.
PMCID: PMC191730
PMID: 9188562
Background
Identification of protein interacting sites is an important task in computational molecular biology. As more and more protein sequences are deposited without available structural information, it is strongly desirable to predict protein binding regions by their sequences alone. This paper presents a pattern mining approach to tackle this problem. It is observed that a functional region of protein structures usually consists of several peptide segments linked with large wildcard regions. Thus, the proposed mining technology considers large irregular gaps when growing patterns, in order to find the residues that are simultaneously conserved but largely separated on the sequences. A derived pattern is called a cluster-like pattern since the discovered conserved residues are always grouped into several blocks, which each corresponds to a local conserved region on the protein sequence.
Results
The experiments conducted in this work demonstrate that the derived long patterns automatically discover the important residues that form one or several hot regions of protein-protein interactions. The methodology is evaluated by conducting experiments on the web server MAGIIC-PRO based on a well known benchmark containing 220 protein chains from 72 distinct complexes. Among the tested 218 proteins, there are 900 sequential blocks discovered, 4.25 blocks per protein chain on average. About 92% of the derived blocks are observed to be clustered in space with at least one of the other blocks, and about 66% of the blocks are found to be near the interface of protein-protein interactions. It is summarized that for about 83% of the tested proteins, at least two interacting blocks can be discovered by this approach.
Conclusion
This work aims to demonstrate that the important residues associated with the interface of protein-protein interactions may be automatically discovered by sequential pattern mining. The detected regions possess high conservation and thus are considered as the computational hot regions. This information would be useful to characterizing protein sequences, predicting protein function, finding potential partners, and facilitating protein docking for drug discovery.
doi:10.1186/1471-2105-8-S5-S8
PMCID: PMC1892096
PMID: 17570867
Aedes aegypti utilizes blood for energy production, egg maturation and replenishment of maternal reserves. The principle midgut enzymes responsible for bloodmeal digestion are endoproteolytic serine-type proteases within the S1.A subfamily. While there are hundreds of serine protease-like genes in the A. aegypti genome, only five are known to be expressed in the midgut. We describe the cloning, sequencing and expression profiling of seven additional serine proteases and provide a genomic and phylogenetic assessment of these findings. Of the seven genes, four are constitutively expressed and three are transcriptionally induced upon blood feeding. The amount of transcriptional induction is strongly correlated among these genes. Alignments reveal that, in general, the conserved catalytic triad, active site and accessory catalytic residues are maintained in these genes and phylogenetic analysis shows that these genes fall within three distinct clades; trypsins, chymotrypsins and serine collagenases. Interestingly, a previously described trypsin consistently arose with other serine collagenases in phylogenetic analyses. These results suggest that multiple gene duplications have arisen within the S1.A subfamily of midgut serine proteases and/or that A. aegypti has evolved an array of proteases with a broad range of substrate specificities for rapid, efficient digestion of bloodmeals.
doi:10.1016/j.jinsphys.2010.01.003
PMCID: PMC2878907
PMID: 20100490
Aedes aegypti; Midgut; Serine proteases
The gene encoding subtilisin-like protease T. kodakaraensis subtilisin was cloned from a hyperthermophilic archaeon Thermococcus kodakaraensis KOD1. T. kodakaraensis subtilisin is a member of the subtilisin family and composed of 422 amino acid residues with a molecular weight of 43,783. It consists of a putative presequence, prosequence, and catalytic domain. Like bacterial subtilisins, T. kodakaraensis subtilisin was overproduced in Escherichia coli in a form with a putative prosequence in inclusion bodies, solubilized in the presence of 8 M urea, and refolded and converted to an active molecule. However, unlike bacterial subtilisins, in which the prosequence was removed from the catalytic domain by autoprocessing upon refolding, T. kodakaraensis subtilisin was refolded in a form with a putative prosequence. This refolded protein of recombinant T. kodakaraensis subtilisin which is composed of 398 amino acid residues (Gly−82 to Gly316), was purified to give a single band on a sodium dodecyl sulfate (SDS)-polyacrylamide gel and characterized for biochemical and enzymatic properties. The good agreement of the molecular weights estimated by SDS-polyacrylamide gel electrophoresis (44,000) and gel filtration (40,000) suggests that T. kodakaraensis subtilisin exists in a monomeric form. T. kodakaraensis subtilisin hydrolyzed the synthetic substrate N-succinyl-Ala-Ala-Pro-Phe-p-nitroanilide only in the presence of the Ca2+ ion with an optimal pH and temperature of pH 9.5 and 80°C. Like bacterial subtilisins, it showed a broad substrate specificity, with a preference for aromatic or large nonpolar P1 substrate residues. However, it was much more stable than bacterial subtilisins against heat inactivation and lost activity with half-lives of >60 min at 80°C, 20 min at 90°C, and 7 min at 100°C.
doi:10.1128/AEM.67.6.2445-2452.2001
PMCID: PMC92893
PMID: 11375149
Protein dynamics and the underlying networks of intramolecular interactions and communicating residues within the three-dimensional (3D) structure are known to influence protein function and stability, as well as to modulate conformational changes and allostery. Acylaminoacyl peptidase (AAP) subfamily of enzymes belongs to a unique class of serine proteases, the prolyl oligopeptidase (POP) family, which has not been thoroughly investigated yet. POPs have a characteristic multidomain three-dimensional architecture with the active site at the interface of the C-terminal catalytic domain and a β-propeller domain, whose N-terminal region acts as a bridge to the hydrolase domain. In the present contribution, protein dynamics signatures of a hyperthermophilic acylaminoacyl peptidase (AAP) of the prolyl oligopeptidase (POP) family, as well as of a deletion variant and alanine mutants (I12A, V13A, V16A, L19A, I20A) are reported. In particular, we aimed at identifying crucial residues for long range communications to the catalytic site or promoting the conformational changes to switch from closed to open ApAAP conformations. Our investigation shows that the N-terminal α1-helix mediates structural intramolecular communication to the catalytic site, concurring to the maintenance of a proper functional architecture of the catalytic triad. Main determinants of the effects induced by α1-helix are a subset of hydrophobic residues (V16, L19 and I20). Moreover, a subset of residues characterized by relevant interaction networks or coupled motions have been identified, which are likely to modulate the conformational properties at the interdomain interface.
doi:10.1371/journal.pone.0035686
PMCID: PMC3338720
PMID: 22558199
A new extracellular protease (PoSl; Pleurotus ostreatus subtilisin-like protease) from P. ostreatus culture broth has been purified and characterized. PoSl is a monomeric glycoprotein with a molecular mass of 75 kDa, a pI of 4.5, and an optimum pH in the alkaline range. The inhibitory profile indicates that PoSl is a serine protease. The N-terminal and three tryptic peptide sequences of PoSl have been determined. The homology of one internal peptide with conserved sequence around the Asp residue of the catalytic triad in the subtilase family suggests that PoSl is a subtilisin-like protease. This hypothesis is further supported by the finding that PoSl hydrolysis sites of the insulin B chain match those of subtilisin. PoSl activity is positively affected by calcium. A 10-fold decrease in the Km value in the presence of calcium ions can reflect an induced structural change in the substrate recognition site region. Furthermore, Ca2+ binding slows PoSl autolysis, triggering the protein to form a more compact structure. These effects have already been observed for subtilisin and other serine proteases. Moreover, PoSl protease seems to play a key role in the regulation of P. ostreatus laccase activity by degrading and/or activating different isoenzymes.
doi:10.1128/AEM.67.6.2754-2759.2001
PMCID: PMC92935
PMID: 11375191
Parasite proteases play key roles in several fundamental steps of the Plasmodium life cycle, including haemoglobin degradation, host cell invasion and parasite egress. Plasmodium exit from infected host cells appears to be mediated by a class of papain-like cysteine proteases called ‘serine repeat antigens’ (SERAs). A SERA subfamily, represented by Plasmodium falciparum SERA5, contains an atypical active site serine residue instead of a catalytic cysteine. Members of this SERAser subfamily are abundantly expressed in asexual blood stages, rendering them attractive drug and vaccine targets. In this study, we show by antibody localization and in vivo fluorescent tagging with the red fluorescent protein mCherry that the two P. berghei serine-type family members, PbSERA1 and PbSERA2, display differential expression towards the final stages of merozoite formation. Via targeted gene replacement, we generated single and double gene knockouts of the P. berghei SERAser genes. These loss-of-function lines progressed normally through the parasite life cycle, suggesting a specialized, non-vital role for serine-type SERAs in vivo. Parasites lacking PbSERAser showed increased expression of the cysteine-type PbSERA3. Compensatory mechanisms between distinct SERA subfamilies may thus explain the absence of phenotypical defect in SERAser disruptants, and challenge the suitability to develop potent antimalarial drugs based on specific inhibitors of Plasmodium serine-type SERAs.
doi:10.1111/j.1462-5822.2009.01419.x
PMCID: PMC2878606
PMID: 20039882
The tautomerase superfamily consists of structurally homologous proteins that are characterized by a β–α–β fold and a catalytic amino-terminal proline. 4-Oxalocrotonate tautomerase (4-OT) family members have been identified and categorized into five subfamilies on the basis of multiple sequence alignments and the conservation of key catalytic and structural residues. Representative members from two subfamilies have been cloned, expressed, purified, and subjected to kinetic and structural characterization. The crystal structure of DmpI from Helicobacter pylori (HpDmpI), a 4-OT homologue in subfamily 3, has been determined to high resolution (1.8 Å and 2.1 Å) in two different space groups. HpDmpI is a homohexamer with an active site cavity that includes Pro-1, but lacks the equivalent of Arg-11 and Arg-39 found in 4-OT. Instead, the side chain of Lys-36 replaces that of Arg-11 in a manner similar to that observed in the trimeric macrophage migration inhibitory factor (MIF), which is the title protein of another family in the superfamily. The electrostatic surface of the active site is also quite different and suggests that HpDmpI might prefer small, monoacid substrates. A kinetic analysis of the enzyme is consistent with the structural analysis, but a biological role for the enzyme remains elusive. The crystal structure of DmpI from Archaeoglobus fulgidus (AfDmpI), a 4-OT homologue in subfamily-4, has been determined to 2.4 Å resolution. AfDmpI is also a homohexamer, with a proposed active site cavity that includes Pro-1, but lacks any other residues that are readily identified as catalytic ones related to 4-OT activity. Indeed, the electrostatic potential of the active site differs significantly in that it is mostly neutral, in contrast to the usual electropositive features found in other 4-OT family members, suggesting that AfDmpI might accommodate hydrophobic substrates. A kinetic analysis has been carried out, but does not provide any clues about the type of reaction the enzyme might catalyze.
doi:10.1016/j.bioorg.2010.07.002
PMCID: PMC2963697
PMID: 20709352
4-oxalocrotonate tautomerase; catalytic proline; hexamer
Background
The study of functional subfamilies of protein domain families and the identification of the residues which determine substrate specificity is an important question in the analysis of protein domains. One way to address this question is the use of clustering methods for protein sequence data and approaches to predict functional residues based on such clusterings. The locations of putative functional residues in known protein structures provide insights into how different substrate specificities are reflected on the protein structure level.
Results
We have developed an extension of the context-specific independence mixture model clustering framework which allows for the integration of experimental data. As these are usually known only for a few proteins, our algorithm implements a partially-supervised learning approach. We discover domain subfamilies and predict functional residues for four protein domain families: phosphatases, pyridoxal dependent decarboxylases, WW and SH3 domains to demonstrate the usefulness of our approach.
Conclusion
The partially-supervised clustering revealed biologically meaningful subfamilies even for highly heterogeneous domains and the predicted functional residues provide insights into the basis of the different substrate specificities.
doi:10.1186/1472-6807-9-68
PMCID: PMC2777906
PMID: 19857261
Peroxiredoxins (Prxs) are a widespread and highly expressed family of cysteine-based peroxidases that react very rapidly with H2O2, organic peroxides, and peroxynitrite. Correct subfamily classification has been problematic since Prx subfamilies are frequently not correlated with phylogenetic distribution and diverge in their preferred reductant, oligomerization state, and tendency towards overoxidation. We have developed a method that uses the Deacon Active Site Profiler (DASP) tool to extract functional site profiles from structurally characterized proteins, to computationally define subfamilies, and to identify new Prx subfamily members from GenBank(nr). For the 58 literature-defined Prx test proteins, 57 were correctly assigned and none were assigned to the incorrect subfamily. The >3500 putative Prx sequences identified were then used to analyze residue conservation in the active site of each Prx subfamily. Our results indicate that the existence and location of the resolving cysteine varies in some subfamilies (e.g. Prx5) to a greater degree than previously appreciated and that interactions at the A interface (common to Prx5, Tpx and higher order AhpC/Prx1 structures) are important for stabilization of the correct active site geometry. Interestingly, this method also allows us to further divide the AhpC/Prx1 into four groups that are correlated with functional characteristics. The DASP method provides more accurate subfamily classification than PSI-BLAST for members of the Prx family and can now readily be applied to other large protein families.
doi:10.1002/prot.22936
PMCID: PMC3065352
PMID: 21287625
functional site profile; mechanistic determinants; function annotation; misannotation; thiol peroxidase; thioredoxin peroxidase; AhpC; Prx; BCP; Tpx
A gene encoding a subtilisin-like protease, designated islandisin, from the extremely thermophilic bacterium Fervidobacterium islandicum (DSMZ 5733) was cloned and actively expressed in Escherichia coli. The gene was identified by PCR using degenerated primers based on conserved regions around two of the three catalytic residues (Asp, His, and Ser) of subtilisin-like serine protease-encoding genes. Using inverse PCR regions flanking the catalytic residues, the gene could be cloned. Sequencing revealed an open reading frame of 2,106 bp. The deduced amino acid sequence indicated that the enzyme is synthesized as a proenzyme with a putative signal sequence of 33 amino acids (aa) in length. The mature protein contains the three catalytic residues (Asp177, His215, and Ser391) and has a length of 668 aa. Amino acid sequence comparison and phylogenetic analysis indicated that this enzyme could be classified as a subtilisin-like serine protease in the subgroup of thermitase. The whole gene was amplified by PCR, ligated into pET-15b, and successfully expressed in E. coli BL21(DE3)pLysS. The recombinant islandisin was purified by heat denaturation, followed by hydroxyapatite chromatography. The enzyme is active at a broad range of temperatures (60 to 80°C) and pHs (pH 6 to 8.5) and shows optimal proteolytic activity at 80°C and pH 8.0. Islandisin is resistant to a number of detergents and solvents and shows high thermostability over a long period of time (up to 32 h) at 80°C with a half-life of 4 h at 90°C and 1.5 h at 100°C.
doi:10.1128/AEM.71.7.3951-3958.2005
PMCID: PMC1168981
PMID: 16000809
Pestiviruses are the only members of the Flaviviridae that encode a nonstructural protease at the N terminus of their polyproteins. This N-terminal protease (Npro) cleaves itself off of the nascent polyprotein autocatalytically and thereby generates the N terminus of the adjacent viral capsid protein C. In previous reports, sequence similarities between Npro and the catalytic residues of papain-like cysteine proteases were put forward. To test this hypothesis, substitutions of cysteine and histidine residues within Npro were carried out by site-directed mutagenesis. Translation of the mutagenized Npro-C proteins in cell-free lysates confirmed that only the predicted Cys69 was an essential amino acid for proteolysis, not His130. Further essential residues were identified with His49 and Glu22. While it remains speculative whether Glu22-His49-Cys69 actually build a catalytic triad, these results invalidate the assumption that Npro is a papain-like cysteine protease.
PMCID: PMC109561
PMID: 9499122
The serine repeat antigen (SERA) proteins of the malaria parasites Plasmodium spp. contain a putative enzyme domain similar to that of papain family cysteine proteases. In Plasmodium falciparum parasites, more than half of the SERA family proteins, including the most abundantly expressed form, SERA5, have a cysteine-to-serine substitution within the putative catalytic triad of the active site. Although SERA5 is required for blood-stage parasite survival, the occurrence of a noncanonical catalytic triad casts doubt on the importance of the enzyme domain in this function. We used phage display to identify a small (14-residue) disulfide-bonded cyclic peptide (SBP1) that targets the enzyme domain of SERA5. Biochemical characterization of the interaction shows that it is dependent on the conformation of both the peptide and protein. Addition of this peptide to parasite cultures compromised development of late-stage parasites compared to that of control parasites or those incubated with equivalent amounts of the carboxymethylated peptide. This effect was similar in two different strains of P. falciparum as well as in a transgenic strain where the gene encoding the related serine-type parasitophorous vacuole protein SERA4 was deleted. In compromised parasites, the SBP1 peptide crosses both the erythrocyte and parasitophorous vacuole membranes and accumulates within the parasitophorous vacuole. In addition, both SBP1 and SERA5 were identified in the parasite cytosol, indicating that the plasma membrane of the parasite was compromised as a result of SBP1 treatment. These data implicate an important role for SERA5 in the regulation of the intraerythrocytic development of late-stage parasites and as a target for drug development.
doi:10.1128/IAI.00278-08
PMCID: PMC2519404
PMID: 18591232
Recently we tentatively identified, by sequence comparison, central domains of the NS3 proteins of flaviviruses and the respective portion of the pestivirus polyprotein as RNA helicases (A.E.G. et al., submitted). Alignment of the N-proximal domains of the same proteins revealed conservation of short sequence stretches resembling those around the catalytic Ser, His and Asp residues of chymotrypsin-like proteases. A statistically significant similarity has been detected between the sequences of these domains and those of the C-terminal serine protease domains of alphavirus capsid proteins. It is suggested that flavivirus NS3 and the respective pestivirus protein contain at least two functional domains, the N-proximal protease and the C-proximal helicase one. The protease domain is probably involved in the processing of viral non-structural proteins.
PMCID: PMC317867
PMID: 2543956
Serine proteases are an abundant class of enzymes that are involved in a wide range of physiological processes and are classified
into clans sharing structural homology. The active site of the subtilisin-like clan contains a catalytic triad in the order Asp, His, Ser
(S8 family) or a catalytic tetrad in the order Glu, Asp and Ser (S53 family). The core structure and active site geometry of these
proteases is of interest for many applications. The aim of this study was to investigate the structural properties of different S8
family serine proteases from a diverse range of taxa using molecular modeling techniques. In conjunction with 12 experimentally
determined three-dimensional structures of S8 family members, our predicted structures from an archaeon, protozoan and a plant
were used for analysis of the catalytic core. Amino acid sequences were obtained from the MEROPS database and submitted to the
LOOPP server for threading based structure prediction. The predicted structures were refined and validated using PROCHECK,
SCRWL and MODELYN. Investigation of secondary structures and electrostatic surface potential was performed using MOLMOL.
Encompassing a wide range of taxa, our structural analysis provides an evolutionary perspective on S8 family serine proteases.
Focusing on the common core containing the catalytic site of the enzyme, the analysis presented here is beneficial for future
molecular modeling strategies and structure-based rational drug design.
PMCID: PMC3218418
PMID: 22125392
serine protease; SB clan; S8 family; homology; threading; modeling
Background
Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards) are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions) that incorporates several pruning strategies to largely reduce the mining cost.
Results
WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode of WildSpan is developed for discovering functional regions of a single protein by referring to a set of related sequences (e.g. its homologues). The discovered W-patterns are used to characterize the protein sequence and the results are compared with the conserved positions identified by multiple sequence alignment (MSA). The family-based mining mode of WildSpan is developed for extracting sequence signatures for a group of related proteins (e.g. a protein family) for protein function classification. In this situation, the discovered W-patterns are compared with PROSITE patterns as well as the patterns generated by three existing methods performing the similar task. Finally, analysis on execution time of running WildSpan reveals that the proposed pruning strategy is effective in improving the scalability of the proposed algorithm.
Conclusions
The mining results conducted in this study reveal that WildSpan is efficient and effective in discovering functional signatures of proteins directly from sequences. The proposed pruning strategy is effective in improving the scalability of WildSpan. It is demonstrated in this study that the W-patterns discovered by WildSpan provides useful information in characterizing protein sequences. The WildSpan executable and open source codes are available on the web (http://biominer.csie.cyu.edu.tw/wildspan).
doi:10.1186/1748-7188-6-6
PMCID: PMC3082213
PMID: 21453542
Sequence analysis of the endoglucanase EGCCA of Clostridium cellulolyticum indicates the existence of two domains: a catalytic domain extending from residue 1 to residue 376 and a reiterated domain running from residue 390 to 450. A small deletion in the C terminal end of the catalytic domain inactivated the protein. From the analysis of the sequences of 26 endoglucanases belonging to family A, we focused on seven amino acids which were totally conserved in all the catalytic domains compared. The roles of two of these, Arg-79 and His-122, were studied and defined on the basis of the mutants obtained by introducing various substitutions. Our findings suggest that Arg-79 is involved in the structural organization of the protein; the His-122 residue seems to be more essential for catalysis. The role of His-123, which is conserved only in subfamily A4, was also investigated.
PMCID: PMC206263
PMID: 1624455
Large-scale automatic annotation of protein sequences remains challenging in postgenomics era. E1DS is designed for annotating enzyme sequences based on a repository of 1D signatures. The employed sequence signatures are derived using a novel pattern mining approach that discovers long motifs consisted of several sequential blocks (conserved segments). Each of the sequential blocks is considerably conserved among the protein members of an EC group. Moreover, a signature includes at least three sequential blocks that are concurrently conserved, i.e. frequently observed together in sequences. In other words, a sequence signature is consisted of residues from multiple regions of the protein sequence, which echoes the observation that an enzyme catalytic site is usually constituted of residues that are largely separated in the sequence. E1DS currently contains 5421 sequence signatures that in total cover 932 4-digital EC numbers. E1DS is evaluated based on a collection of enzymes with catalytic sites annotated in Catalytic Site Atlas. When compared to the famous pattern database PROSITE, predictions based on E1DS signatures are considered more sensitive in identifying catalytic sites and the involved residues. E1DS is available at http://e1ds.ee.ncku.edu.tw/ and a mirror site can be found at http://e1ds.csbb.ntu.edu.tw/.
doi:10.1093/nar/gkn324
PMCID: PMC2447799
PMID: 18524800
A secreted chlamydial protease designated CPAF (Chlamydial Protease/proteasome-like Activity Factor) degrades host proteins, enabling Chlamydia to evade host defenses and replicate. The mechanistic details of CPAF action, however, remain obscure. We used a computational approach to search the protein data bank for structures that are compatible with the CPAF amino acid sequence. The results reveal that CPAF possesses a fold similar to that of the catalytic domains of the tricorn protease from Thermoplasma acidophilum, and that CPAF residues H105, S499, and E558 are structurally analogous to the tricorn protease catalytic triad residues H746, S965, and D1023. Substitution of these putative CPAF catalytic residues blocked the CPAF from degrading substrates in vitro, while the wild type and a noncatalytic control mutant of CPAF remained cleavage-competent. Substrate cleavage is also correlated with processing of CPAF into N-terminal (CPAFn) and C-terminal (CPAFc) fragments, suggesting that these putative catalytic residues may also be required for CPAF maturation.
doi:10.1016/j.abb.2009.01.014
PMCID: PMC2768414
PMID: 19388144
Chlamydia; CPAF; Hidden Markov Models; HHPRED; MODELLER; molecular modeling; tricorn protease; catalytic triad; site directed mutagenesis; protein structure prediction
S6K1 is a member of the AGC subfamily of serine-threonine protein kinases, whereby catalytic activation requires dual phosphorylation of critical residues in the conserved T-loop (T229) and hydrophobic motif (HM; T389) peptide regions of its catalytic kinase domain (residues 1-398). In addition to its kinase domain, S6K1 contains a C-terminal autoinhibitory domain (AID; residues 399-502), which prevents T-loop and HM phosphorylation; and autoinhibition is relieved on multi-site Ser-Thr phosphorylation of the AID (S411, S418, T421, and S424). Interestingly, 66 of the 104 C-terminal AID amino acid residues were computer predicted to exist in structurally disordered peptide regions, begetting interest as to how such dynamics could be coupled to autoregulation. To begin addressing this issue, we developed and optimized protocols for efficient AID expression and purification. Consistent with computer predictions, aberrant mobilities in both SDS-PAGE and size-exclusion chromatography, as well as low chemical shift dispersion in 1H-15N HSQC NMR spectra, indicated purified recombinant AID to be largely unfolded. Yet, trans-addition of purified AID effectively inhibited PDK1-catalyzed T-loop phosphorylation of a catalytic kinase domain construct of S6K1. Using an identical purification protocol, similar protein yields of a tetraphospho-mimic mutant AID(D2ED) construct were obtained; and this construct displayed only weak inhibition of PDK1-catalyzed T229 phosphorylation. Purification of the structurally ‘disordered’ and functional C-terminal AID and AID(D2ED) constructs will facilitate studies aimed to understand the role of conformational plasticity and protein phosphorylation in modulating autoregulatory domain-domain interactions.
doi:10.1016/j.pep.2007.09.014
PMCID: PMC2276620
PMID: 17980619
custom gene synthesis; intrinsic disorder; disordered proteins; autoinhibition; phosphoinositide-dependent protein kinase-1; PDK1; proteolysis inhibition; minimal media
This paper presents a web service named MAGIIC-PRO, which aims to discover functional signatures of a query protein by sequential pattern mining. Automatic discovery of patterns from unaligned biological sequences is an important problem in molecular biology. MAGIIC-PRO is different from several previously established methods performing similar tasks in two major ways. The first remarkable feature of MAGIIC-PRO is its efficiency in delivering long patterns. With incorporating a new type of gap constraints and some of the state-of-the-art data mining techniques, MAGIIC-PRO usually identifies satisfied patterns within an acceptable response time. The efficiency of MAGIIC-PRO enables the users to quickly discover functional signatures of which the residues are not from only one region of the protein sequences or are only conserved in few members of a protein family. The second remarkable feature of MAGIIC-PRO is its effort in refining the mining results. Considering large flexible gaps improves the completeness of the derived functional signatures. The users can be directly guided to the patterns with as many blocks as that are conserved simultaneously. In this paper, we show by experiments that MAGIIC-PRO is efficient and effective in identifying ligand-binding sites and hot regions in protein–protein interactions directly from sequences. The web service is available at and a mirror site at .
doi:10.1093/nar/gkl309
PMCID: PMC1538832
PMID: 16845025
The eukaryotic calpains are a family of calcium-dependent papain-like proteases and their non-enzymatic relatives whose varied physiological functions are beginning to be fully explored.
The calpain family is named for the calcium dependence of the papain-like, thiol protease activity of the well-studied ubiquitous vertebrate enzymes calpain-1 (μ-calpain) and calpain-2 (m-calpain). Proteins showing sequence relatedness to the catalytic core domains of these enzymes are included in this ancient and diverse eukaryotic protein family. Calpains are examples of highly modular organization, with several varieties of amino-terminal or carboxy-terminal modules flanking a conserved core. Acquisition of the penta-EF-hand module involved in calcium binding (and the formation of heterodimers for some calpains) seems to be a relatively late event in calpain evolution. Several alternative mechanisms for binding calcium and associating with membranes/phospholipids are found throughout the family. The gene family is expanded in mammals, trypanosomes and ciliates, with up to 26 members in Tetrahymena, for example; in striking contrast to this, only a single calpain gene is present in many other protozoa and in plants. The many isoforms of calpain and their multiple splice variants complicate the discussion and analysis of the family, and challenge researchers to ascertain the relationships between calpain gene sequences, protein isoforms and their distinct or overlapping functions. In mammals and plants it is clear that a calpain plays an essential role in development. There is increasing evidence that ubiquitous calpains participate in a variety of signal transduction pathways and function in important cellular processes of life and death. In contrast to relatively promiscuous degradative proteases, calpains cleave only a restricted set of protein substrates and use complex substrate-recognition mechanisms, involving primary and secondary structural features of target proteins. The detailed physiological significance of both proteolytically active calpains and those lacking key catalytic residues requires further study.
doi:10.1186/gb-2007-8-6-218
PMCID: PMC2394746
PMID: 17608959
A lumbrokinase gene encoding a blood-clot dissolving protein was cloned from earthworm (Eisenia fetida) by RT-PCR amplification. The gene designated as CST1 (GenBank No. AY840996) was sequence analyzed. The cDNA consists of 888 bp with an open reading frame of 729 bp, which encodes 242 amino acid residues. Multiple sequence alignments revealed that CST1 shares similarities and conserved amino acids with other reported lumbrokinases. The amino acid sequence of CST1 exhibits structural features similar to those found in other serine proteases, including human tissue-type (tPA), urokinase (uPA), and vampire bat (DSPAα1) plasminogen activators. CST1 has a conserved catalytic triad, found in the active sites of protease enzymes, which are important residues involved in polypeptide catalysis. CST1 was expressed as inclusion bodies in Escherichia coli BL21(DE3). The molecular mass of recombinant CST1 (rCST) was 25 kDa as estimated by SDS–PAGE, and further confirmed by Western Blot analysis. His-tagged rCST1 was purified and renatured using nickel-chelating resin with a recovery rate of 50% and a purity of 95%. The purified, renatured rCST1 showed fibrinolytic activity evaluated by both a fibrin plate and a blood clot lysis assay. rCST1 degraded fibrin on the fibrin plate. A significant percentage (65.7%) of blood clot lysis was observed when blood clot was treated with 80 mg/mL of rCST1 in vitro. The antithrombotic activity of rCST1 was 912 units/mg calculated by comparison with the activity of a lumbrokinase standard. These findings indicate that rCST1 has potential as a potent blood-clot treatment. Therefore, the expression and purification of a single lumbrokinase represents an important improvement in the use of lumbrokinases.
doi:10.1371/journal.pone.0053110
PMCID: PMC3531398
PMID: 23300872
We report here the first structure of a member of the IgA protease family at 1.75Å resolution. This protease is a founding member of the Type V (autotransporter) secretion system and is considered a virulence determinant among the bacteria expressing the enzyme. The structure of the enzyme fits that of a classical autotransporter in which several unique domains necessary for protein function are appended to a central, 100 Å long β-helical domain. The N-terminal domain of the IgA protease is found to possess a chymotrypsin-like fold. However, this catalytic domain contains a unique loop D that extends over the active site acting as a lid, gating substrate access. The data presented provide a structural basis for the known ability of IgA proteases to only cleave the P/S/T rich hinge peptide unique to IgA1 in the context of the intact fold of the immunoglobulin. Based upon the structural data as well as molecular modeling, a model is presented that suggests the unique, extended loop D in this IgA protease sterically occludes the active site binding cleft in the absence of immunoglobulin binding. Only in the context of binding of the IgA1 immunoglobulin Fc domain in a valley formed between the N-terminal protease domain and another domain appended to the β-helix spine (domain-2) is the lid stabilized in an open conformation. The stabilization of this open conformation through Fc association subsequently allows access of the hinge peptide to the active site resulting in recognition and cleavage of the substrate.
doi:10.1016/j.jmb.2009.04.041
PMCID: PMC2720633
PMID: 19393662