We report the crystal structures of rhodesain•K11002 and TbCatB•CA074, two papain family cysteine proteases implicated in the pathogenesis of Trypanosoma brucei
infection (). The structure of rhodesain•K11002 is similar to that of rhodesain•K11777 (PDB ID 2P7U) 
with the bound inhibitor varying only at the P3 position (N-methyl piperazine in K11777, morpholino urea in K11002). While a number of hydrogen bonds are formed between residues lining the substrate-binding site and the inhibitor backbone, a number of hydrophobic residues also provide binding energy, principally in the S2 subsite (), the subsite that confers selectivity for this class of enzyme. This is in contrast with the TbCatB•CA074 complex where hydrogen bonding between the enzyme and inhibitor dominate over hydrophobic interactions. The phenylsulfone moiety at P1′ is a common motif represented in many parasite cysteine protease•vinylsulfone complexes. The dual conformation of this moiety in the rhodesain•K11002 structure is unique for a parasite cysteine protease•vinylsulfone complex and we have not observed this in other high resolution structures of rhodesain or the closely related cruzain from Trypanosoma cruzi
The TbCatB crystal structure, the first reported for this enzyme, is similar in overall structure to homologous cathepsins B-like enzymes studied (Table S1
), with the majority of the variation found in the occluding loop region (discussed below). Our crystal structure also reveals several interesting features that are atypical of a cathepsin B-like cysteine protease. Cathepsin B family members were originally defined in vertebrate systems as possessing an acidic residue at the bottom of the S2 subsite that allows for the accommodation of basic residues in the pocket 
. TbCatB has a Gly at this position, which opens up the pocket allowing larger P2 substituents to be targeted to this part of the active site cleft (). Homology modeling previously indicated an acidic functionality around the S2 subsite of TbCatB 
, that may be able to take advantage of a positive charge at the P2 position of small molecule inhibitors. These acidic residues line the sides (Asp16675
) and bottom (Asp327244
) of the pocket. In our structure Asp258175
are available for binding and in each copy of TbCatB interact with a glycerol molecule from the cryoprotectant solution ().
The substrate binding sites of TbCatB and mammalian CatBs.
An acidic functionality around the S2 subsite of TbCatB.
Superimposition of TbCatB•CA074 and rhodesain•K11002 highlights structural differences that cause rhodesain to be more sterically restricted at the S2 subsite (). Firstly, Asp16675
in TbCatB is substituted for a Leu in rhodesain (Leu67); Asp16675
in TbCatB is able to pack itself against helix α3 where it establishes a number of hydrogen bonding interactions (). Leu67 packs more favorably against the hydrophobic environment of the rhodesain S2 subsite with the large hydrophobic phenylalanyl at the P2 position of K11002. This residue therefore points in toward the substrate-binding site in rhodesain. Secondly, rhodesain has the larger Ala208 (Gly328245
in TbCatB) at the bottom of the S2 subsite, making the pocket shallower (). Finally, the loop between strands β2 and β3 in rhodesain is anchored to an adjacent loop (between strands β5 and β6) by a disulfide bridge between Cys155 and Cys203. A number of direct and water-mediated hydrogen bonding interactions stabilize this conformation and Gln159 and Leu160 are pulled into the S2 subsite to further narrow the pocket (). In TbCatB, the β2–β3 loop lacks the cysteine required to form the anchoring disulfide bridge, and is glycine-rich (Gly269186
) when compared with rhodesain. The additional flexibility allows the C-terminal portion of the TbCatB loop to adopt a conformation similar to that found in human cathepsin B, removed from the S2 subsite and oriented towards the prime sites. Of note, the mobility of this loop was recently alluded to in homology modeling studies by Mallari et al.
in comparison with human cathepsin L 
The S2 subsites of TbCatB and rhodesain.
TbCatB has an ‘occluding loop’, a unique feature of cathepsin B-like enzymes, which spans the prime side of the substrate binding site and distinguishes them from the cathepsin L-like enzymes 
. In TbCatB, the loop is three residues longer than in mammalian homologs and we note a dual peptide conformation between His 194110
. The occluding loop in TbCatB further deviates from homologous structures between residues 206120
. Human, rat and bovine cathepsin B have an invariant “GEGD” motif in this region. The glycine residues flanking Glu122 confer additional flexibility in this region such that the negatively charged residue is able to flip in and out of the active site 
(). The corresponding motif in TbCatB, “FNFD”, lacks this flexibility and both Phe208121
stack with the N-terminal residue (Phe189105
) of the occluding loop, creating a more stable opening around S1′. This feature of the TbCatB occluding loop presents the possibility to engineer additional specificity into inhibitors targeting this enzyme. Indeed Mallari et al.
have shown that out of a series of 56 compounds, only those with a specific N9 substituent (hydroxypropyl) were reasonable human CatB inhibitors. The authors propose this may be due to the ability of this substituent to stabilize the flexible loop in a favorable conformation. This stabilizing interaction was not expected to be important in TbcatB; indeed TbcatB was tolerant of a wide range of substitutions at this position on the inhibitor scaffold.
An interesting aspect of mammalian cathepsin B-like enzyme structure is the presence of two salt bridges (His110-Asp22 and Arg116-Asp224) that stabilize the “closed” conformation of the loop in the mature form (). Mutations that disrupt either ion pair are correlated with a major increase in endopeptidase activity 
, presumably due to a corresponding increase in loop flexibility. While the His-Asp pair is conserved in TbCatB, Arg116 is substituted for Tyr202 and Asp224 is substituted for Glu307. In TbCatB, the acidic residue does not interact directly with Tyr202, but instead stabilizes the occluding loop at an insertion (relative to mammalian enzymes) through an interaction with Asn200 (). It is tempting to speculate on the role that these substitutions might play, if any, on altering the characteristic pH dependance of cathepsin B activity/inhibition. However, at present, we have no biochemical evidence to support this assumption and clearly this is a point that requires further investigation through mutational analysis.
Interactions stabilizing the occluding loop in cathepsin B-like enzymes.
Cysteine proteases are expressed as inactive “zymogens” containing a “pro”-domain that aids in the proper folding of the full-length protein and suppresses the activity of the catalytic (mature) domain. Autoproteolysis results in cleavage between the pro and catalytic domains yielding the fully active, mature enzyme. Comparison of this TbCatB mature domain with crystal structures of the mature domains of mammalian cathepsin B enzymes, as well as the mature domains of rhodesain and papain, shows that the TbCatB structure has an unusually long N-terminus. However, further comparison with parasite cysteine proteases reveal that an elongated N-terminus is shared with the malarial proteases falcipain-2 (FP-2) and falcipain-3 (FP-3) 
). Superimposition with TbCatB reveals the N-terminal extension of these proteases to be of similar length to that found in our structure (16 residues in falcipain-2 and 18 residues in falcipain-3). Furthermore, the extension in TbCatB establishes several polar and hydrophobic interactions with the L and R domains of the main α/β fold (Figure S2B
), as is observed in structures of FP-2 and FP-3 (although this results in the malarial extensions adopting more extended secondary structure). While comparisons can be drawn between TbCatB and the malarial proteases, the atypical N-terminus of FP-2 and FP-3 was already identified before the structures were known, including the lack of a typical papain-family mature cleavage site 
. Conversely, TbCatB does contain such a cleavage site and, contrary to our findings, residues upstream were expected to form part of the pro-domain. Our Edman sequencing data suggest the possibility of an even longer end (33 residues N-terminal to the predicted cleavage site ‘LPSS’). Analysis of the crystal packing in our TbCatB model suggests these residues may occupy a nearby solvent channel in the crystal and are therefore disordered. Alternatively, they may be lost during crystallization. Comparisons with the human and rat unactivated zymogens (PDB IDs 1MIR and 3PBH) show that, in the full-length “pro” form, the equivalent residues form a long loop and short helix that occlude the active site. The possibility of an additional 33 residues at the N-terminus of the mature TbCatB therefore remains an intriguing puzzle. While our sequencing data exclude the possibility of the recombinant enzyme being activated during yeast cell culture, we cannot exclude cellular activation of the endogenous enzyme as expressed by the native parasite. The Western Blot data show the latter to be larger upon activation than predicted by sequence analyses but slightly smaller than the recombinant form. We can only speculate that perhaps the native enzyme undergoes further processing during expression in T. brucei
. Future experiments will be guided towards shedding further light on the unusual processing of this parasite cysteine protease.