|Home | About | Journals | Submit | Contact Us | Français|
The aim of this study was to use comparative modeling to predict the three-dimensional structure of the CHAPK protein (cysteine, histidine-dependent amidohydrolase/peptidase domain of the LysK endolysin, derived from bacteriophage K). Iterative PSI-BLAST searches against the Protein Data Bank (PDB) and nonredundant (nr) databases were used to populate a multiple alignment for analysis using the T-Coffee Expresso server. A consensus Maximum Parsimony phylogenetic tree with a bootstrap analysis setting of 1,000 replicates was constructed using MEGA4. Structural templates relevant to our target (CHAPK) were identified, processed in Expresso and used to generate a 3D model in the alignment mode of SWISS-MODEL. These templates were also processed in the I-TASSER web server. A Staphylococcus saprophyticus CHAP domain protein, 2K3A, was identified as the structural template in both servers. The I-TASSER server generated the CHAPK model with the best bond geometries when analyzed using PROCHECK and the most logical organization of the structure. The predicted 3D model indicates that CHAPK has a papain-like fold. Circular dichroism spectropolarimetry also indicated that CHAPK has an αβ fold, which is consistent with the model presented. The putative active site maintained a highly conserved Cys54-His117-Glu134 charge relay and an oxyanion hole residue Asn136. The residue triplet, Cys-His-Glu, is known to be a viable proteolytic triad in which we predict the Cys residue is used in a nucleophilic attack on peptide bonds at a specific site in the pentaglycine cross bridge of staphylococcal cell wall peptidoglycan. Use of comparative modeling has allowed approximation of the 3D structure of CHAPK giving information on the structure and an insight into the binding and active site of the catalytic domain. This may facilitate its development as an alternative antibacterial agent.
Staphylococcus aureus is a common pathogen that plays a major role in various human and animal diseases ranging from skin and soft tissue infections to more serious cases of pneumonia, endocarditis, meningitis and osteomyelitis.1 Treatment of these infections has become increasingly difficult due to the worldwide prevalence of multidrug-resistant strains including methicillin resistant S. aureus (MRSA) which is a frequent cause of serious nosocomial infections.2 As a consequence it is critical to develop new and effective antibacterials with the potential of eliminating such infections irrespective of antibiotic sensitivity.
Over the past decade numerous studies have focused on developing recombinant bacteriophage (phage)-encoded cell wall hydrolases, termed endolysins (lysins), as novel antibacterial agents as recently reviewed by Loessner (2005),3 Fischetti (2008)4 and Fenton et al. (2010).5 When applied exogenously as purified recombinant proteins to Gram-positive bacteria, lysins bring about rapid cell lysis and bacterial death.3,6,7 It is this ability to rapidly lyse pathogenic Gram-positive cells upon direct contact with peptidoglycan, also termed “lysis from without,” that has laid the foundation for their exploitation as novel therapeutics.6
The majority of lysins display a modular structure, usually comprising of at least one N-terminal catalytic domain which attacks bacterial cell wall peptidoglycan, combined with a C-terminal cell wall binding domain which directs the lytic domain to its site of action.8,9 In the case of staphylococcal lysins the presence of three domains, comprising an N-terminal cysteine, histidine-dependent amidohydrolase/peptidase (CHAP) domain and an amidase-2 domain linked to a C-terminal SH3b cell wall binding domain, is a common feature. This organization has been observed in LysK,10 LysWMY,11 Φ11 lysin,12 MV-L,13 LysH5,14 LysGH15,15 and SAL-1.16
Studies have shown that of the two lytic domains contained within staphylococcal lysins, the CHAP domain confers the principle enzymatic activity of the protein, whereas the amidase domain contributes minimal detectable activity.17-20 Furthermore, a recent study in our laboratory showed that the activity of the single-domain truncated CHAP (later designated CHAPK), was 2-fold higher than the native three-domain LysK protein.20 Studies have also demonstrated that the CHAP domain of staphylococcal lysins acts as a D-Ala-Gly endopeptidase, specifically cleaving the peptide bond between D-alanine and the first glycine in the pentaglycine cross-bridge of staphylococcal cell wall peptidoglycan17,18 (Fig. 1).
CHAP proteins contain three highly conserved amino acid residues, two of which are an invariant cysteine (Cys) and histidine (His) along with a third, polar residue such as asparagine (Asn), aspartic acid (Asp) or glutamic acid (Glu).21 These residues are principally involved in catalytic activity, forming part of the active site of the enzyme as well as being equivalent to the catalytic triad of papain-like thiol proteases.21,22 Site-directed mutagenesis studies on these conserved residues within the CHAP domain of the LysWMY staphylococcal lysin resulted in reduced activity, suggesting that a Cys-His-Asn catalytic triad is crucial for enzymatic function.11
The CHAP domain is a member of the NlpC/P60 family of peptidases and can be found in proteins from bacteria, archaea and eukaryotes of the Trypanosomidae family.22 However, very little structural information is available in relation to the CHAP domains of phage lysins. To date the crystal structure of five lysins has been elucidated, Cpl-1,23 PlyL,24 PlyPSA,25 PlyB,26 and Ply500,27 none of which are derived from staphylococcal phage and none of which contain a CHAP domain.
In the current study a comparative modeling approach is employed to predict the three dimensional (3D) structure of the 162-amino-acid CHAP domain protein, CHAPK, derived from the staphylococcal phage K lysin; LysK.10 The potential of CHAPK as an alternative antibacterial agent which displays rapid lytic activity against pathogenic staphylococci including MRSA, both in vitro and in vivo, has previously been demonstrated by our group.20,28
Comparative modeling allows accurate prediction of the 3D structure of a target protein (CHAPK) based on its similarity to a related template protein whose structure has been experimentally resolved by either X-ray crystallography or Nuclear Magnetic Resonance (NMR) spectroscopy.29 The basis of this approach is that evolutionary related proteins share similar 3D structures if they have a statistically significant sequence similarity (generally > 25% identity).30 The model prediction process consists of four consecutive steps starting with template selection, followed by target-template alignment, model building and finally model evaluation.31
Resolution of the 3D structure of CHAPK by comparative modeling can provide a valuable insight into the structural basis of interaction with its cell wall peptidoglycan substrate and mechanism of catalytic activity. This knowledge may subsequently be used in the design of novel lysins with altered and improved catalytic domains, or the creation of designer lysins with unique capabilities (e.g., enhanced turnover rate, altered specificity) furthering the development of these unique enzymes as alternative antibacterial agents in the fight against multi-drug resistant bacterial infections.
The first step in the comparative modeling of the target protein, CHAPK, was to identify potential templates whose structure has been resolved by either X-ray crystallography or NMR. The two main criteria for selecting suitable templates for the target were sequence length (choosing proteins of similar length to CHAPK) and percentage identity between the target and template sequence of > 25% over the entire length of the sequence. Generally, proteins with > 25% sequence identity can be labeled homologous and share at least 80% of their structure.30,32 With the aim of collecting the most suitable template sequences to form a strong multiple alignment in the second step of comparative modeling, a PSI-BLAST (Position-Specific Iterated BLAST) search was performed against the Protein Data Bank (PDB) over a number of iterations. This bank is a repository of the 3D structural data of proteins and nucleic acids which has been experimentally obtained by X-ray crystallography or NMR. The most suitable template obtained from the PDB search was the enzyme 2K3A; a 155-amino-acid secreted CHAP protein from Staphylococcus saprophyticus33 with 28% sequence identity to CHAPK. This template returned the lowest (E)-value (0.01) and largest % coverage (58%). The BLAST search converged after three iterations and no known structure received a significantly high score, over a range of scoring matrix and gap penalty settings. This suggested that not many CHAP domain structures with similarity to phage-derived CHAP domains have been experimentally resolved to date.
In order to populate the multiple alignment in step two, iterative PSI-BLAST searches were also performed in the non-redundant (nr) database. Although the inclusion of homologous sequences from the nr database does not contribute structural information, it aids in constructing a better quality multiple alignment, which is generally regarded as more reliable than a pair-wise structural alignment alone.31
In the second step of comparative modeling, the combination of selected template structures from the PDB and sequences from the nr-database, collected in step one, were uploaded to the T-Coffee Expresso server together with the amino acid sequence of the target (CHAPK). The Expresso suite is a multiple sequence alignment server which accurately aligns sequences that include structural information.34 The resulting multiple alignment of the CHAPK target and template proteins was used to construct a phylogenetic tree and identify templates which were most closely related to the target. A consensus maximum parsimony phylogenetic tree with bootstrap analysis inferred from 1,000 replicates was constructed using the MEGA4 software suite35 and taken to represent the evolutionary history of the protein sequences analyzed (Fig. 2A). Upon examination of the tree, proteins in close proximity to the target, including the protein 2K3A (with experimentally derived 3D structure) were selected and processed again using the Expresso alignment algorithm (Fig. 2B).
For the third step in the process (model building), the above alignment was submitted to the Alignment mode of the web-based homology modeling server, SWISS-MODEL.36 This program facilitates the construction of 3D models of a target protein using experimentally determined structures of related family members.37 Models were constructed using the 2K3A protein and each structurally resolved protein from the clade containing CHAPK in the phylogenetic tree (Fig. 2A). Inspection of predicted 3D CHAPK models in relation to structural compatibility between the target and template, sharing of conserved amino acids and the location of these residues in the overall structure, resulted in the identification of the 2K3A protein as the most promising candidate for accurate modeling of CHAPK.
In model building, accurate alignment of the target sequence with selected template is a crucial step and poor alignments are commonly responsible for inaccurate models.37 In order to examine the quality of the target-template alignment and the models created in SWISS-MODEL Alignment mode, CHAPK was submitted to an additional web-based server, I-TASSER.38 The results from I-TASSER independently identified the 2K3A protein as the most suitable template for constructing an accurate 3D model of CHAPK. Upon inspection and comparison of the predicted models from both I-TASSER (Fig. 3A) and SWISS-MODEL (Fig. 3B) servers, it was obvious that I-TASSER had generated a better quality alignment as its predicted model had a fold and catalytic residue position that was consistent with enzymatic function33 (Fig. 3A).
In the fourth step, the quality of the I-TASSER generated CHAPK model was evaluated using PROCHECK39 and visualized using PyMOL.40 The predicted model had two short α-helices packed against a β-sheet saddle, which is a typical feature of NplC/P60 papain-related peptidases21 (Fig. 3A). Model refinement was performed using CNS_SOLVE,41 and the resultant structure had good geometries with 54.3% of residues falling in the most favored region of the Ramachandran plot, and only 2.9% outliers. A comparison between the newly constructed CHAPK model and the 2K3A CHAP protein template was also assessed. Superposition of CHAPK on the template structure used LSQMAN.42 The calculated root mean square distance (r.m.s.d) value of 1.9 Ångstroms (Å) on 143 C-α atoms indicated the expected degree of agreement between the two structures. Inspection of the superimposed structures in PyMOL showed good agreement between CHAPK and the S. saprophyticus CHAP domain throughout the structures (Fig. 4).
Further analysis of secondary structure in the CHAPK model was performed using the software package Digital Shape Sampling and Processing (DSSP).43,44 The predicted structure differed from the parental template model 2K3A in that the region between residues 86 and 102 contains a β-strand in 2K3A (Fig. 5B). However, the CHAPK model has this region somewhat more distant from the sheet (Fig. 5A). As a result these residues are not being categorized as a β-strand in the CHAPK model by DSSP43,44 (Fig. 5A).
Circular dichroism (CD) spectropolarimetry was performed to investigate the presence of an αβ fold, predicted in the CHAPK model. The CD spectrum obtained for CHAPK showed characteristic minima in the range 205–220 nm, typical of a protein containing α-helical and β-sheet structure (Fig. 6). Deconvolution of the spectrum using CDPro45,46 and a test set comprising 43 soluble and 13 membrane proteins, indicated that CHAPK contains an αβ fold, which supports the model. The exact α-helix and β-sheet content determined from the spectrum varied depending on which algorithm was used in the CDPro45,46 suite of programs. These data are shown in Table 1. In the case of SELCON and CONTINLL algorithms, the content of α-helix is close to that found in the model (13.9%) while the CDSSTR algorithm is under-estimating the α-helical content significantly. For all components of CDPro45,46 the estimated β-sheet structure is at variance with the amount of β-sheet seen in the model (33.9%, 35.2% and 39% for SELCON, CONTINLL and CDSSTR respectively, compared with an estimated 18.7% in the model). This could be explained by additional secondary structure in the protein that is not well modeled. The region from residues 86–102 (Fig. 5A) in the predicted CHAPK model is an area where there is potentially an additional β-strand. This would bring the CHAPK model closer in structure to the 2K3A structure. In addition, the N-terminal region of the 2K3A protein is an area of extended coil in the NMR structure, this may become more ordered and contribute additional β-sheet structure. Regardless of the precise secondary structure composition, CD spectropolarimetry indicated that the CHAPK protein has an αβ fold, which is consistent with the predicted 3D model.
Both target and template proteins have highly-conserved cysteine and histidine residues with Cys54-His117 residues of CHAPK (magenta in Fig. 7) located approximately in the same position as the Cys57-His109 residues in the active site of 2K3A (cyan in Fig. 7). The distance between the Cys57 and His109 catalytic residues of 2K3A was found to be 3.3 Ångströms (Å) (Fig. 7), which is within hydrogen bond distance.33 This distance is a critical feature in imparting enzymatic activity to CHAP proteins. Typically, the presence of a substrate in the Cys-His active site, triggers the deprotonation of cysteine by the histidine residue, resulting in a nucleophilic attack by the deprotonated cysteine on a substrate acyl carbonyl carbon, releasing a fragment of the substrate with an N-terminus. This leads to the formation of a tetrahedral ionic intermediate, restoration of the histidine residue to its deprotonated form and finally the release of the carboxylic acid moiety from the cysteine residue via hydrolysis. This results in the regeneration of the Cys-His active site.47,48 The distance of 3.7 Å between the corresponding Cys54-His117 active site residues in the CHAPK model are comparable with the active site of the 2K3A template (Fig. 7). It is therefore plausible that the CHAPK Cys54-His117 highly conserved residues have a similar catalytic mode of action. When localized to their staphylococcal peptidoglycan substrate, they initiate a rapid nucleophilic attack on the peptide bond between D-alanine and the first glycine in the pentaglycine cross-bridge, resulting in cleavage of the bacterial cell wall and rapid cell death. To further investigate the proposed role of the conserved Cys54-His117 active-site residues, a QuikChange II Site-Directed Mutagenesis Kit (Agilent Technologies Irealand Ltd) was used to create point mutations where a change from Cys54 to alanine (Ala) or His117 to Ala resulted in loss of enzymatic activity (data not shown). This strongly indicates that these two residues play a crucial role in CHAPK catalytic activity.
In total, eight conserved residues, Cys54, His117, Glu134, Asn 136, Gly116, Gly74, Thr59 and Gln55 at or near the surface of the predicted CHAPK model may be essential for enzymatic activity. These residues coordinate the formation of a broad and shallow cleft which encompasses the active site of the enzyme (Fig. 8). This was consistent with the 2K3A template structure.33 In the 2K3A protein, active site residues Cys57-His109-Glu126 are arranged in a viable proteolytic triad both in a sequence order and a topology typical of cysteine peptidases, with Asn128 acting as an additional stabilizing group.33 Analysis of the CHAPK model revealed corresponding highly conserved residues constituted by a Cys54-His117-Glu134 charge relay and an oxyanion hole (stabilizing) residue Asn136, near the surface of the active site cleft.
The exact mechanism of binding of CHAPK to its peptidoglycan substrate remains unknown. However, examination of the electrostatic surface potential of the active site of CHAPK revealed an overall electropositive charge (Fig. 8) indicating electrostatic complementarity with the electronegative character of its peptidoglycan substrate. This combination of attractive and repulsive forces may help guide the substrate into the correct position for binding before contacting the active site of CHAPK. In addition, it was suggested that in the case of 2K3A, two exposed tyrosine aromatic groups, Tyr107 and Tyr129, either side of the active site possibly function as rails to anchor the substrate in position.33 This may also be the case for CHAPK as it has two exposed tyrosine residues (Tyr112 and Tyr140) on either side of its active site (Fig. 3A) and whose location corresponds closely to the position of Tyr107 and Tyr129 in the 2K3A template structure.
In the present study, comparative modeling was used to construct a 3D model of the phage derived peptidase, CHAPK. Analysis of the 3D model provided a structural basis for CHAPK activity and presented a probable mechanism involving the highly conserved Cys54-His117-Glu134-Asn136 active site residues. These residues have been identified in other lysins as the key catalytic residues involved in the hyrolysis of a D-Ala-Gly substrate, in the pentaglycine cross-bridge of staphylococcal cell wall peptidoglycan. The proposed nucleophilic-attack mechanism of the conserved Cys54 residue on the peptide bond in an acid-base reaction is thought to be the most likely catalytic mechanism for all members of the CHAP NplC/P60 family of proteases.
Further exploration and investigation of phage lysins and their various domains, particularly at a structural and bioinformatic level, can provide important information for understanding biochemical function and specific substrate interaction mechanisms. This in turn may contribute to specifically engineered designer lysins with diverse and improved applications for development as novel antibacterials. Additional research is currently underway to experimentally resolve the crystal structure of CHAPK.
The following web-based servers, suites, molecular analysis and visualization programs were used in the comparative modeling process for CHAPK structure prediction. Basic Local Alignment Search Tool (BLAST) (www.ncbi.nlm.nih.gov/BLAST/), T-Coffee (www.tcoffee.org), MEGA4,35 SWISS-MODEL (www.swissmodel.expasy.org),36 iterative threading assembly refinement (I-TASSER),38 PyMOL,40 PROCHECK,39 CNS_SOLVE41 LSQMAN42 and DSSP.43,44 For CD spectropolarimetry The CHAPK domain was dialysed extensively against 10 mM potassium phosphate buffer, pH 7.0. Protein was diluted to 0.019 mg/mL in the same buffer for analysis. Data was collected from 260 nm to 180 nm on a Chirascan instrument (Applied Photophysics). Data was converted to Δε values prior to analysis. Determination of the secondary structure from the CD spectrum was performed using the CDPro45,46 software package.
This work was financed by Science Foundation Ireland Ref 06/RF/BIM004. We thank Dr. Todd Kagawa for his valuable input in the analysis of CD data.
Previously published online: www.landesbioscience.com/journals/bacteriophage/article/18245