|Home | About | Journals | Submit | Contact Us | Français|
Interactions between cohesin and dockerin modules play a crucial role in the assembly of multienzyme cellulosome complexes. Although intraspecies cohesin and dockerin modules bind in general with high affinity but indiscriminately, cross-species binding is rare. Here, we combined ELISA-based experiments with Rosetta-based computational design to evaluate the contribution of distinct residues at the Clostridium thermocellum cohesin-dockerin interface to binding affinity, specificity, and promiscuity. We found that single mutations can show distinct and significant effects on binding affinity and specificity. In particular, mutations at cohesin position Asn37 show dramatic variability in their effect on dockerin binding affinity and specificity: the N37A mutant binds promiscuously both to cognate (C. thermocellum) as well as to non-cognate Clostridium cellulolyticum dockerin. N37L in turn switches binding specificity: compared with the wild-type C. thermocellum cohesin, this mutant shows significantly increased preference for C. cellulolyticum dockerin combined with strongly reduced binding to its cognate C. thermocellum dockerin. The observation that a single mutation can overcome the naturally observed specificity barrier provides insights into the evolutionary dynamics of this system that allows rapid modulation of binding specificity within a high affinity background.
What determines binding specificity, and how easily can this specificity be changed during evolution? These are fundamental questions regarding molecular interactions in biological systems. Nature manipulates binding specificity and affinity using different strategies (1) ranging from subtle changes at the level of residue point mutations in distinct regions of the interface (2) to more dramatic changes such as loop insertion or deletion (3, 4). Protein binding interfaces consist in general of independent patches of networks of interacting residues (5). Such patches can perform different tasks in the interaction (6).
Anaerobic, cellulose-degrading bacteria produce a sophisticated multiprotein complex called cellulosome that is optimized for efficient degradation of cellulose (7). The different components of the cellulosome are held together by interactions between two distinct modules: cohesin modules that occur in repeats on the scaffoldin protein and dockerin modules that are attached to hydrolases or to additional scaffoldins (Fig. 1A). The cohesin-dockerin interaction (see Fig. 1B) is characterized by very high affinity (8, 9). The small ~70-residue dockerin module contains two F-hand motif repeats (10, 11) that allow binding of its cohesin partner in two symmetric orientations (see Fig. 1, C and D) (12, 13). The cohesin module folds into a β sandwich and binds to dockerin using one of its β sheets; binding specificity is achieved by the loops at the periphery of this sheet (12,–16). Cohesin-dockerin interactions can be classified into three general types according to their sequence similarity (7). In C. thermocellum and C. cellulolyticum, enzyme-associated dockerins are in general grouped as type I, whereas dockerins that participate in attachment of scaffoldins to the cell belong to the type II family (represented in Fig. 1A as CohI-DocI and CohII-DocII, respectively). Within a given species, dockerin and cohesins may interchangeably bind (within the same family type), thereby increasing cellulosomal heterogeneity (notably its enzyme content). In contrast to this binding promiscuity for different cohesin modules of the same species, interactions between dockerin and cohesin modules from different species have not been often observed (Ref.17; for an exception, see e.g. Sakka et al. (18)).
The cohesin-dockerin interaction is an ideal model system for the assessment of minimal changes needed for modulation of binding promiscuity and specificity. How many mutations are needed and where are they located? Rational manipulation of the cohesin-dockerin interaction is a precondition toward more elaborate designs of multimolecular cellulosomes of specific architecture and composition in the future (19,–21).
Previous studies identified specificity-determining residues in the dockerin module based on sequence conservation within and across different species (8, 22). In a gene-swapping experiment, Nakar et al. (23) were able to switch the binding specificity of a cohesin from C. cellulolyticum to dockerin of C. thermocellum by the replacement of only three residues. In their large scale assessment of correlated mutations, Halperin et al. (24) harnessed the ample information about intraspecies binding promiscuity and interspecies binding specificity in the cohesin-dockerin interaction to identify interface residues of this interaction in a precise manner.
Here we used a structure-based approach to determine which residues contribute to the high affinity observed in this interaction and which in turn are important for binding specificity. We applied a combination of computational design and binding experiments to identify crucial features in the C. thermocellum cohesin that determine the structural and physical bases for binding affinity and specificity in the type I cohesin-dockerin interface. We characterized residues at the cohesin-dockerin interface by computational modeling using the Rosetta molecular modeling suite (25) and validated these predictions using indirect ELISA (iELISA)4 specifically developed by us to measure effects on binding of high affinity interactions (26, 27). Evaluation of the effect of these mutations on binding by a number of additional computational as well as experimental approaches allowed us to provide a robust picture of the main determinants of binding affinity and specificity of this interaction. Our study identified two types of hot spot residues in C. thermocellum cohesin: affinity hot spots such as Leu83 in the conserved hydrophobic patch of the cohesin-dockerin interface contribute significantly to binding affinity, whereas the specificity hot spot Asn37 in the hydrogen bond network in the C. thermocellum cohesin-dockerin interface plays a crucial role in determining binding specificity.
Throughout this study, we refer to the C. thermocellum complex between the second cohesin module in scaffoldin A (residues 183–322) and the dockerin connected to endo-1,4-β-xylanase Y (Xyn10B; residues 733–788) (Protein Data Bank code 1OHZ (12)) and to the C. cellulolyticum complex between the first cohesin module on scaffoldin C (residues 277–427) and the dockerin connected to endoglucanase A (Cel5A; residues 410–472, A16S/L17T mutant) (Protein Data Bank code 2VN6 (16)). Residue numbering refers to these structures. In this study, we report modeling studies based on the dockerin orientation that positions recognition residues Ser45/Thr46 and Ala47/Phe48 for C. thermocellum and C. cellulolyticum, respectively, at the interface. Structures of the inverse, symmetrical binding mode show a very similar interface (C. thermocellum, Protein Data Bank code 2CCL (13); C. cellulolyticum, Protein Data Bank code 2VN5 (16)) and provide similar results (data not shown).
Electron density maps do not allow distinction between the two different possible planar side chain orientations. The positions of asparagine side chain Oδ1 and Nδ2 atoms (as well as glutamine side chain atoms Oϵ1 and Nϵ2) are therefore usually defined based on optimization of the surrounding polar environment and the optimal satisfaction of hydrogen bonds. The NQ-Flipper protocol (28) suggests that Asn37 in structure 1OHZ is misfitted. Therefore, we used in all computations a starting structure of the C. thermocellum cohesin-dockerin interaction in which the side chain of Asn37 had been flipped for optimal positioning of its side chain hydrogen bond donor and acceptor as also suggested previously (29).
The structure-based computational analysis of the cohesin-dockerin interaction was performed using the Rosetta modeling framework in which structure optimization and sequence design are performed using a stochastic search based on Monte Carlo with minimization and an energy function dominated by tight and clash-free packing, burial of hydrophobic residues, and satisfying hydrogen bonds (25, 30,–32). We used Rosetta version 2.3.0 throughout this study, i.e. Rosetta revision 12795 and Rosetta database revision 21964, unless mentioned otherwise. Rosetta is available free of charge to academic users.
We applied the Rosetta interface mode for alanine scanning to identify interface residue hot spots as well as for the design of sequences with modified affinity (33). The contribution of different interface residues to binding was evaluated by mutation to alanine and measuring the effect on binding: ΔΔGbind = ΔGbind mutant − ΔGbind WT where ΔGbind = Gcomplex − Gfree protein A − Gfree protein B. A threshold of ΔΔGbind ≥1.0 kcal/mol was used to select putative hot spot residues (as in e.g. Refs. 31 and 34). The ΔΔG predictions were calculated by introducing minimal changes in the structure as we would not expect major backbone changes (the evaluated residues and their neighbors are restricted: they lie within the β-sheet of the cohesin or within the helix or calcium binding loop of the dockerin). The starting structure was first optimized by minimizing the complex structure. Upon mutation, neighboring side chain atoms were minimized, but side chains were not repacked and therefore not allowed to rearrange significantly.
The command line consists of:
where <INPUT_PDB> can be an experimentally solved structure or a model obtained by docking (see below), -interface invokes the interface mode, the mutations to model (e.g. N37D and N37A/D39A) are specified in the file defined by -mutlist, -output_structure writes the structures modeled during the simulation into Protein Data Bank format files, and the results specifying ΔG and energy values of the wild type and mutants are written to the output file specified by -intout. Torsion angles (both backbone and side chain) and rigid body orientation were minimized prior to the modeling of the mutation.
We compared this protocol (Rosetta 2.3) with a parameterized protocol for modeling effects of point mutations (termed here Rosetta 3.0* and described in detail in Kellogg et al. (34) as protocol16). For these calculations, we used Rosetta revision 34507 and Rosetta database version 40221. The main differences between these two protocols is that the latter allows the repacking of all interface side chain residues together with optimized weights of the scoring function and a combination of soft repulsive repacking followed by standard, hard repulsive minimization of both backbone and side chain atoms under constraints that tether the structure to the starting conformation.
Protocol16 consists of the following command line:
The resfile supplied allows repacking of all residues, including the input side chain conformation, choosing from a rotamer library with extended χ1 and χ2 angle sampling. As an example, the resfile for mutation N37A would contain the following:
In short, this protocol first repacks the input structure and then performs the mutation of interest. Starting both from the wild-type and the mutated structure, 50 independent runs of minimization of all torsion and rigid body degrees of freedom are performed for each. ΔΔG is then calculated as the difference in energy between the minimal energy conformation of the mutant and the wild type (among the 50 simulations of each).
In addition, we compared the Rosetta-based protocols with the following other approaches: 1) FoldX (35) version 3.0 Beta3 run locally using default values; 2) EPPI-2 Orbit interface energy function EPPI-2 (36) for which two different implementations of EPPI-2 were used: EPPI-2 values were transformed to optimize correlation to a large set of 404 experimental ΔΔGbind values (Orbit1) or a restricted set of 53 designed mutations (Orbit2) (corresponding to Fig. 3, C and F, in Ref. 36, respectively); 3) Hunter (37); and 4) Concoord/PBSA (38).
A model of the non-cognate interaction was generated using as partners the C. thermocellum cohesin structure from Protein Data Bank structure 1OHZ (12) where the residue Asn37 was truncated to alanine (N37A) and the C. cellulolyticum dockerin structure from Protein Data Bank codes 2VN5 and 2VN6 (16).
We used RosettaDock (with off-rotamer minimization of the side chain conformations (30, 39, 40)) to model the structure of non-cognate cohesin-dockerin pairs based on the monomer structures taken from the cognate C. thermocellum and C. cellulolyticum complex structures (see above). We considered two possible orientations for modeling the complex between C. thermocellum cohesin mutant N37A and the C. cellulolyticum dockerin since either the first (Ala16-Leu17) or second (Ala47-Phe48) dockerin recognition motif can interact with the cohesin module (Protein Data Bank codes 2VN5 and 2VN6, respectively (16)) because of the dual mode of cohesin-dockerin binding. In our calculations, the second orientation predicted better ΔG values (−20.5 versus −17.0 Rosetta energy units) and was therefore chosen for subsequent analysis.
Docking was performed as described previously (40). The docking command line consists of the following:
-dock_pert 3 8 8 will start the refinement protocol from a structure that has been slightly perturbed to sample the local energy landscape: random moves according to Gaussian distributions with 3-Å (8-Å) standard deviation (S.D.) are performed on translations along the axis that connects the centers of mass (and the two perpendicular axes), and according to a Gaussian distribution with 8° S.D. on rotations around the three Euler angles (30), -dock_mcm indicates the docking protocol that involves Monte Carlo sampling with minimization of rigid body orientation, -dock_rtmin allows sampling of off-rotamer side chain conformations, -unbound_rot adds for each position the side chain conformation encountered in the free (i.e. starting) monomer conformation to the rotamer library and assigns it minimal energy to bias toward this conformation, -ex1 and -ex2aro allow the inclusion of additional rotamers (±S.D. for χ1 for all amino acids and for χ2 of aromatic residues), -dock_score_norepack will just rescore the starting structure rather than repacking and minimizing it, -nstruct 1000 will generate 1000 independent models, and -scorefile XXXX.score indicates the name of the score output file.
In a preceding step, the free monomers were prepacked to remove any internal clashes. In this step, the same sampling of side chain conformations was allowed. The command line consists of the following:
where -prepack_rtmin allows sampling of off-rotamer side chain conformations, and the other parameters are as described above.
Plasmid cassettes of the fusion proteins xylanase-fused dockerin (XynDoc) and carbohydrate-binding module-borne cohesin (CBM-Coh) were produced as described earlier (29, 41, 42). We used the cohesin constructs coh2-CBM of C. thermocellum scaffoldin CipA (the second cohesin module and following cellulose-binding module) and miniCipC of C. cellulolyticum scaffoldin C (CBM, hydrophilic domain, and the first cohesin module) and the dockerin constructs C. thermocellum XynDocS (C. thermocellum Cel48S dockerin) and C. cellulolyticum XynDocA (C. cellulolyticum Cel5A dockerin). Mutations of cohesin residues were performed by site-directed mutagenesis using the QuikChange kit (Stratagene). The mutations were verified by DNA sequencing. Mutant cohesin fragments were restricted with BamHI and XhoI enzymes and ligated into pET28a, which was also digested with the same enzymes. The final construct was verified by sequencing.
Wild-type and mutant CBM-Coh constructs were expressed and purified by affinity chromatography on a cellulose resin as described by Barak et al. (41). XynDocs were expressed and purified as described previously (41) with the following changes. The proteins were expressed in BL21 in LB medium supplemented with 50 μg/ml kanamycin. The culture was grown at 37 °C until it reached an A600 of ~0.6–0.8 and then induced with 1 mm isopropyl 1-thio-β-d-galactopyranoside for 3 h at 37 °C. The dockerins were purified on a nickel-nitrilotriacetic acid column as described, without heat treatment. Purity of the CBM-Coh and XynDocs proteins was estimated by analytical gel filtration using a Superdex 200 column. Protein concentration was determined by absorption.
The relative binding of the cohesin mutants was estimated using the iELISA-based method described previously in detail (26) and briefly below. Different concentrations (1 pm–1 μm) of cohesin mutants were incubated with 100 pm wild-type Xyn-dockerin for 1 h at 37 °C in binding buffer (TBS, 10 mm CaCl2, 0.05% Tween 20, 2% BSA). Next, 100 μl of the mixture was transferred to the 96-well MaxiSorp (Nunc A/S, Roskilde, Denmark) plate coated with wild-type cohesin for incubation for 15–30 min at 37 °C. Preformed complexes were then washed with washing buffer (TBS, 10 mm CaCl2, 0.05% Tween 20) followed by 1-h incubation with rabbit-anti Xyn, washing, and another 1-h incubation with HRP-conjugated goat anti-rabbit antibodies.
The relative binding of the cohesin mutants to C. cellulolyticum dockerin was measured as in previous studies except for a few modifications. 300 pm C. cellulolyticum dockerin was incubated with different concentrations (1 pm–1 μm) of cohesin mutants, and plates coated with the C. thermocellum cohesin N37A mutant were used. This allows presentation of differences in the binding affinity of C. cellulolyticum dockerin to WT C. cellulolyticum, WT C. thermocellum, and mutant C. thermocellum cohesin in one plot.
The obtained binding data were analyzed using GraphPad Prism (version 5.00 for Windows; GraphPad Software, San Diego, CA). Wild-type cohesin binding was used to normalize the experimental scale, and all mutants were standardized according to the wild type. The results were fitted to a sigmoidal dose-response curve (43), and changes in free energy of binding (ΔΔGbind_exp) were calculated relative to the wild type according to the following equation:
Protein cellulose microarray experiments were performed as described previously (17). In short, WT C. thermocellum, WT C. cellulolyticum, and mutant N37A C. thermocellum cohesin modules were printed in rows of 2-fold diluted concentration each. The slides were incubated with either C. thermocellum XynDocS or C. cellulolyticum XynDocA and visualized using anti-xylanase antibody labeled with Cy5.
The structure of the C. thermocellum cohesin-dockerin interface contains a hydrophobic patch that is conserved in both C. thermocellum and C. cellulolyticum cohesin-dockerin interactions (Figs. 1C and and22A) as well as an extensive network of hydrogen bonds at the center of the interface (Fig. 1D and Fig. 3A). The C. thermocellum dockerin residues Ser45 and Thr46 located in the conserved binding motif of the second F-hand constitute a central part of this network (Fig. 3B). In the corresponding C. cellulolyticum dockerin, this binding motif is replaced by hydrophobic residues Ala47 and Phe48 (Fig. 1D and Ref. 16).
To locate interface residues that play a critical role in binding, we first identified putative interface hot spot residues in the C. thermocellum cohesin-dockerin interface with computational alanine scanning using Rosetta (version 2.3). Binding hot spot residues (calculated ΔΔGbind ≥1.0 kcal/mol) are located in 1) the hydrophobic patch (cohesin Leu83 and dockerin Leu22), 2) the network of hydrogen bonds (cohesin Asn37, Asp39, and Glu131 and dockerin Ser45 and Thr46), and 3) other regions of the interface (e.g. cohesin Tyr74 and a conserved intermolecular salt bridge between cohesin Glu86 and dockerin Arg53). Table 1 summarizes predicted and experimental ΔΔG values evaluated in this as well as previous studies.
The hydrophobic patch is centered around the conserved C. thermocellum cohesin residue Leu83 (Figs. 1C and and22A). The strong conservation of this hydrophobic and exposed residue suggests an important functional role in the interaction with dockerin partners. To investigate the contribution of this residue to binding affinity, we replaced Leu83 with a smaller residue that creates a void in the hydrophobic interface patch and is therefore predicted to significantly affect binding (mutations L83A and L83S; see Table 1). We used iELISA (26, 27) to measure the effect of these mutations on the affinity of the C. thermocellum cohesin-dockerin interaction (see “Experimental Procedures”). The IC50 of the wild type C. thermocellum cohesin-dockerin interaction was 1 nm, whereas the IC50 values of the L83A and L83S mutants were 123 and 257 nm, respectively, corresponding to a decrease in binding free energy ΔΔG of 3.0 and 3.4 kcal/mol, respectively (Fig. 2B and Table 1). Both mutations indeed significantly impair binding. In compliance with this result, the corresponding residue in C. cellulolyticum cohesin, Leu87, was shown previously to significantly decrease binding upon mutation to alanine (16). These experimental data validate the crucial role of Leu83 in the interaction of the C. thermocellum cohesin with its cognate dockerin and highlight the importance of the hydrophobic patch at this interface.
The network of hydrogen bonds at the interface of the C. thermocellum cohesin-dockerin complex is centered on the specificity-determining dockerin residues Ser45 and Thr46 and on cohesin residues Asn37, Asp39, and Glu131 (Figs. 1D and and3,3, A and B). Our structure-based analysis and energy calculations suggest that mutations to alanine at these positions will reduce binding affinity significantly (ΔΔG > 1.0 kcal/mol) by disrupting the network of hydrogen bonds and by modifying the electrostatic attraction between the overall negatively charged cohesin and positively charged dockerin modules (Table 1).
Using iELISA experiments to measure the binding of the cohesin single mutants N37A, D39A, and E131A to wild-type C. thermocellum dockerin, we confirmed that D39A and E131A mutations indeed significantly impair binding to the C. thermocellum dockerin (ΔΔG >4.0 kcal/mol for D39A and 1.8 kcal/mol for E131A, respectively). Surprisingly, we observed no such effect for N37A (Fig. 3C and Table 1). In addition, the N37A mutation was found to reduce the effect of the E131 mutant in the polar patch: the double mutant N37A/E131A (ΔΔG = 1.0 kcal/mol) had less impact on binding than the single mutant E131A (ΔΔG = 1.8 kcal/mol) (Fig. 3C and Table 1).
We next investigated two additional mutations at the polar patch, N37D and D39N. These mutations involve only minor changes of one side chain atom but result in a change of the overall charge and affect the network of hydrogen bonds (Fig. 3B). Our calculations predict that N37D will be detrimental to binding, whereas D39N will not affect binding affinity (Table 1). iELISA experiments confirmed the predicted detrimental effect of the N37D mutant on binding (ΔΔG = 2.5 kcal/mol). Contrary to our predictions, however, the iELISA experiment showed an even stronger reduction of binding for the D39N mutant (ΔΔG > 4 kcal/mol) (Fig. 3D and Table 1). Together, these two mutations, N37D and D39N, highlight the sensitivity of this interface to both addition and removal of negative charges.
For the mutations tested by iELISA experiments for their effect on binding in this study, we found an overall good agreement of predicted and experimentally measured effects except for the mutations N37A and D39N (Table 1). Why then does our model fail to accurately capture the effect of these two mutations? Are other protocols more successful? We recalculated predicted effects on binding using a range of other published and available approaches, namely FoldX (35), Hunter (37), two different versions of the Orbit function optimized for interface prediction EPPI-2 (36) (termed here Orbit1 and Orbit2), and CC/PBSA (38) (see “Experimental Procedures”). In addition, we repeated the prediction with a more recent Rosetta protocol calibrated specifically for the prediction of effect of mutations on protein stability (termed here Rosetta 3.0*; protocol16 in Kellogg et al. (34)). The results of these different approaches are summarized in Fig. 4. In concordance with our results described above, all approaches identify the strong effect on binding upon Leu83 mutation to serine or alanine. However, for mutations of polar residues Asp39 and Asn37, the agreement among the protocols on the effect on binding is lower, and none can correctly describe the whole set. Hunter does not predict a significant effect on binding for any of these mutations (including N37A). The lack of destabilization by N37A is missed by all other methods but Rosetta 3.0*. The latter predicts that the energy contributed by the hydrogen bonds formed by N37A is small due to non-optimal geometry (compared e.g. with the hydrogen bonds formed by Asp39) and therefore cannot compensate for the solvation penalty of burying Asn37. Therefore, mutation to alanine at this position will not significantly affect binding affinity. The strong effect of D39N is also missed by most of the other approaches (including CC/PBSA that is aimed at accurately modeling electrostatic effects) except for Orbit1. This protocol, however, predicts a similar effect for all mutations, and slight modification of the threshold used to define a hot spot (1 kcal/mol) will strongly affect the results.
We reconfirmed that the C. thermocellum cohesin mutation N37A does not affect binding to its cognate dockerin by independent evaluation of the binding ability of both wild-type C. thermocellum cohesin and C. thermocellum cohesin N37A to C. thermocellum and C. cellulolyticum dockerin using a cellulose-binding microarray assay (17). Surprisingly, the assay revealed that the N37A mutant also binds to non-cognate C. cellulolyticum dockerin unlike the wild-type C. thermocellum cohesin (Fig. 5A). Thus, the cross-species binding barrier observed for wild-type C. thermocellum and C. cellulolyticum cohesin-dockerin pairs is partially overcome in the C. thermocellum cohesin mutation N37A because it binds promiscuously to both C. thermocellum and C. cellulolyticum dockerins.
We further confirmed this non-cognate binding by the corresponding iELISA assay. We compared the affinity of C. cellulolyticum dockerin to the promiscuous C. thermocellum cohesin mutant N37A with its affinity to both its cognate (C. cellulolyticum) and non-cognate (C. thermocellum) wild-type cohesin modules (Fig. 5B). The results indicate that in strong contrast to wild-type C. thermocellum cohesin, which does not bind the C. cellulolyticum dockerin, C. thermocellum N37A mutant cohesin indeed binds to C. cellulolyticum dockerin. Still, C. thermocellum N37A mutant cohesin bound with lower affinity to C. cellulolyticum dockerin than the cognate wild-type C. cellulolyticum cohesin (ΔΔG = +1.3 kcal/mol; Fig. 5B and Table 2). We conclude from these two experiments that mutation of a single amino acid residue in the C. thermocellum cohesin creates a promiscuous cohesin module that can overcome the specificity barrier and bind also to C. cellulolyticum dockerin.
We note that in this iELISA experiment, to measure the amount of non-bound dockerin, the preformed complexes were incubated with C. thermocellum N37A cohesin-coated plates, rather than with C. cellulolyticum wild-type cohesin. This optimized our ability to distinguish between the affinities of the three different C. thermocellum cohesin molecules to C. cellulolyticum dockerin (see “Experimental Procedures” and “Discussion”).
A structural model of the C. thermocellum cohesin N37A mutant bound to its non-cognate C. cellulolyticum dockerin (Fig. 5C) suggests how the N37A C. thermocellum cohesin mutant could accommodate this hydrophobic motif region in C. cellulolyticum dockerin. In the cognate C. cellulolyticum interaction, dockerin specificity motif residue Phe48 points into the interface to interact with the side chains of Ala129 and Lys137 and the backbone of Met135 of C. cellulolyticum cohesin (Fig. 1D). Conversely, in the non-cognate interaction, Phe48 points in a different direction toward a newly created hydrophobic patch formed by C. thermocellum cohesin residue Leu129 and C. cellulolyticum dockerin residues Ile52 and Asn40 (Fig. 5C). In the C. cellulolyticum cohesin-dockerin interaction, Ile52 is exposed and only partly covered by dockerin residue Lys55 (Fig. 5D and Ref. 16), whereas in the model of interaction between the C. thermocellum cohesin N37A mutant and the C. cellulolyticum dockerin, Lys55 moves away to the solvent to make room for C. thermocellum cohesin Leu129 (Fig. 5C).
If a single mutation overcomes the species barrier and allows cohesin to bind to non-cognate dockerin, how difficult would it be to keep this new interaction and abolish the original cognate interaction, i.e. to create a switch in binding specificity? We designed two mutants, single mutant N37L and double mutant N37A/D39A, based on the solved structures of C. thermocellum and C. cellulolyticum cognate cohesin-dockerin complexes (Protein Data Bank codes 1OHZ (12) and 2VN6 (16), respectively). Although the large and polar asparagine side chain at position 37 is suited for its polar environment in the C. thermocellum cognate complex, it would not fit the corresponding hydrophobic environment in the C. cellulolyticum complex, which could be the reason for its failure to bind to non-cognate C. cellulolyticum dockerin. The mutant N37L in turn is predicted to prefer the hydrophobic non-cognate C. cellulolyticum dockerin over the polar cognate C. thermocellum dockerin (Fig. 6A). Experimental validation of this prediction with the iELISA assay confirmed that indeed the N37L C. thermocellum cohesin mutant barely binds to the cognate C. thermocellum dockerin anymore (ΔΔG > 4 kcal/mol compared with WT C. thermocellum cohesin; Fig. 6B and Table 1). Instead, it now binds to the non-cognate C. cellulolyticum dockerin (ΔΔG = 1.7 kcal/mol compared with WT C. cellulolyticum cohesin) with slightly lower affinity than that of N37A (ΔΔG = 1.3 kcal/mol; Fig. 6C and Table 2).
The double mutant N37A/D39A shows a similar binding pattern to C. cellulolyticum dockerin as N37L: impaired binding to its cognate C. thermocellum dockerin (ΔΔG > 4 kcal/mol; Table 1) and new binding to its non-cognate C. cellulolyticum dockerin (ΔΔG = 1.5 kcal/mol; Fig. 6, D–F, and Table 2). In this case, a mutation that abolishes the original binding (D39A) was combined with a mutation that creates new, non-cognate binding (N37A), resulting in a mutant that lost its strong original binding ability but gained binding ability to a new, non-cognate partner. Note, however, that for both the N37L and the N37A/D39A mutants that change their binding specificity from C. thermocellum dockerin to C. cellulolyticum dockerin, preferences of the dockerins to cohesin have not changed (Fig. 6B): C. cellulolyticum dockerin still shows preferential binding to its cognate C. cellulolyticum cohesin rather than the non-cognate C. thermocellum cohesin mutants.
In this study, we used a structure-based approach to study binding specificity and promiscuity in the cohesin-dockerin interaction. Our main finding is that the balance between specificity and promiscuity can be controlled by a single residue. We discuss this easy disruption of specificity in light of other studies on protein binding specificity and affinity and its evolution and identify similarities and differences both at functional and structural levels. Finally, we relate to challenges and successes in current, state-of-the-art modeling protocols and experimental assays that are aimed at the accurate characterization of binding affinity and specificity.
The cohesin-dockerin interaction presents a special case of a high affinity interaction whose binding promiscuity and specificity can strongly be affected by one single mutation. Our computational results, confirmed by two independent experimental assays, demonstrate the crucial and central role that residue Asn37 in the C. thermocellum cohesin can play in binding specificity. Although wild-type C. thermocellum cohesin binds only to C. thermocellum dockerin, mutation N37A abolishes this specificity and extends to non-cognate C. cellulolyticum dockerins, and mutation N37L completes a specificity switch toward C. cellulolyticum dockerin. Residue Asn37 can thus be termed a binding specificity hot spot. Interestingly, mutation N37A also has stabilizing effects on D39A and E131A mutations in the polar patch of the interface (Fig. 3C and Table 1), indicating that specificity comes at a price of increased sensitivity to mutations (i.e. the interaction with dockerin is less sensitive to mutations for the promiscuous mutant N37A than for the specific wild-type cohesin).
The new non-cognate binding can be explained in light of the distinct features of C. thermocellum and C. cellulolyticum dockerins: the polar binding motif of C. thermocellum dockerin in the second F-hand contains Ser45 and Thr46, whereas in C. cellulolyticum dockerin, they are replaced by Ala47 and Phe48 at corresponding positions (Fig. 1D). The accommodation of the Phe48 side chain into an existing hydrophobic patch (Fig. 5C) without the need of further evolutionary changes of the interface highlights the preorganized plasticity of interfaces, similar to what has been observed in other experiments of evolution of new binding specificities (2).
Rather than affecting binding affinity on its own, the N37A mutation attenuates the effect on binding affinity of additional mutations. Consequently, rather than an interface binding hot spot, this residue functions as an interface specificity hot spot. We have observed a similar behavior of a cohesin residue in the type III cohesin-dockerin interaction that upon mutation increased binding and attenuated effects by other positions at the interface.5 Thus, mutations that do not impair binding but rather extend the range of binding partners to dockerins from different species (i.e. binding to non-cognate C. cellulolyticum dockerin) can readily appear during evolution.
The dramatic effect on binding specificity of one single mutation as reported for the cohesin-dockerin interaction studied here contrasts with the robustness that has been reported for other systems such as the colicin-immunity protein interaction (6). Reducing or switching specificity in the colicin-immunity protein interaction involves a number of mutations that were found only in an extended in vitro evolution experiment (2) but not identified by computation (44, 45). In contrast, the cohesin-dockerin interaction shows a much simpler picture in which major effects on specificity may be achieved by a single residue as was demonstrated previously on the dockerin module (8) and is shown here on the cohesin module.
It therefore seems that functional rather than biophysical constraints have shaped the differential plasticity of these two high affinity associations. The interaction between cohesin and dockerin plays a very different biological role compared with the interaction between colicin and its immunity protein. The binding of the immunity protein to the bacterial endonuclease is crucial as failure to do so will result in active nuclease molecules within the cell and lead to bacterial death. Therefore, this interaction has evolved to be robust to strong effects on binding by single mutations. Consequently, crossing the strain-specific barriers would be difficult as this would involve mutations that reduce binding affinity at the price of promiscuity.
In contrast, we speculate that the cohesin-dockerin interaction within a species might have evolved toward promiscuity to facilitate extensive diversity in the hydrolase enzyme composition of cellulosome subpopulations (because the cohesin module repeats can each bind to a range of different dockerin modules, each connected to a distinct hydrolase). This platform for diversity can be fine-tuned on demand toward maximal degradation efficiency of specific cellulosic substrates. Expression of specific hydrolases is regulated at the transcriptional level by specific σ factors, which are released from carbohydrate sensor proteins upon their binding to specific plant cell wall polysaccharide substrates (46,–48). Because of the promiscuous binding between dockerin modules of the different hydrolases to the different repeats of cohesin modules on the scaffoldin, this malleable platform can be fine-tuned based on the specifically induced hydrolases, resulting in cellulosomes with optimized composition for degradation of the specific substrate(s).
Despite the fundamental differences between the cohesin-dockerin and the colicin-immunity protein interaction described above, their underlying interface architecture shows similar organization. In-depth mutational analysis of the colicin-immunity protein interaction has suggested a so-called “dual recognition mechanism” (6, 49) that involves two distinct patches at the interface: one conserved patch is responsible for binding affinity (i.e. two conserved central tyrosine residues), and the second patch governs binding specificity (i.e. the network of hydrogen bonds) (Note that this dual recognition mechanism should not be confused with the “dual binding mode” of the cohesin-dockerin interaction that allows binding of dockerin in two opposite orientations described in the Introduction (13, 16)). The interface of the C. thermocellum cohesin-dockerin interaction is also composed of a conserved hydrophobic patch and a specific polar patch. We have shown here that the hydrophobic patch plays an important role in binding affinity because the mutation of a central position, L83A, significantly impairs binding (Fig. 2). Despite its hydrophobic and exposed nature, the high degree of cross-species sequence conservation of this residue (Fig. 1C) indicates a more general role of this patch for cohesin-dockerin binding affinity. In contrast, binding specificity is obtained by the polar patch that is conserved only within C. thermocellum and replaced by a hydrophobic patch in C. cellulolyticum. This patch in C. thermocellum contains at least one residue, Asn37, that upon mutation to alanine can overcome the species barrier and create a promiscuous cohesin that binds both its cognate (C. thermocellum) and non-cognate (C. cellulolyticum) dockerin (Fig. 6).
The large number of charged residues at the C. thermocellum cohesin-dockerin interface suggests that this interaction is governed by electrostatics. The detrimental effect on binding of the C. thermocellum cohesin D39N mutation (first reported in Ref. 29 and reconfirmed in this study by an iELISA assay; Fig. 3D) suggests that this is due to removal of a charge. The importance of this charge at position 39 could not be foreseen by molecular modeling attempts (see Fig. 4) even after including a Coulomb electrostatic term in the Rosetta scoring function, the use of a generalized Born-based treatment of electrostatic solvation energy implemented into the Rosetta modeling suite (50), and a detailed calculation of the pKa of Asp39 using the APBS modeling suite (51).
Molecular dynamics simulations of the wild-type cohesin and the D39N mutant suggested that removal of the charge at position Asp39 leads to a significant destabilization of a nearby region, which is not observed in the wild type (52). However, these simulations originated from the crystal structure solved by Carvalho et al. (12) (Protein Data Bank code 1OHZ) in which the critical Asn37 side chain has been incorrectly fitted. In the wild-type simulation, Asn37 is indeed flipped at the very beginning of the simulation, whereas in the simulation of the D39N mutant, it is not. Consequently, the pronounced effect observed in the Xu et al. (52) study likely reflects a non-equilibrated starting conformation.
The ability to identify interface residue hot spots is the basis for more complex redesign of interfaces and binding specificity (53, 54). Considerable effort has therefore been put into the development of computational approaches to model the effect of point mutations on protein stability and binding affinity (33, 34, 36, 55, 56). Despite good overall accuracy, these approaches are still challenged by complex polar effects at the interface as also demonstrated in the present study. Nevertheless, together with structural analysis, they enabled us to pinpoint the crucial interactions in the interface that contribute to binding in different ways.
In addition to our ability to address fundamental questions about specificity and promiscuity of the cohesin-dockerin interaction, our study bears biotechnological implications for the fabrication of artificial designer cellulosome complexes (19,–21, 57,–64). Specific cohesin-dockerin pairs could allow the generation of cellulosome complexes of predetermined content and spatial architecture. This study provides initial support for the feasibility of such an application. Future steps will involve the generation of additional specific cohesin-dockerin pairs.
We thank Oz Sharabi and Julia Shifman for evaluating the effect on binding of different mutants with the Orbit energy function. We also thank Tim Whitehead for reading the manuscript and for helpful suggestions.
*This work was supported, in whole or in part, by the Israel Science Foundation (ISF) funded by the Israel Academy of Science and Humanities (Grants 306/06 and 319/11 to O. S.-F., 1349/13 to E. A. B., and 24/11 to R. L. by The Sidney E. Frank Foundation through the ISF); the United States-Israel Binational Science Foundation, Jerusalem, Israel (Grants 2009418 to O. S.-F. and 2013284 to E. A. B.); the European Research Council (Starting Grant 310873 to O. S.-F.); the European Union, Area NMP.2013.1.1-2: (Selfassembly of naturally occurring nanosystems: CellulosomePlus Project 604530 to E. A. B.), and a European Research Area-Industrial Biology Consortium (EIB.12.022, acronym FiberFuel, to E. A. B.).
5M. Slutzki, I. Noach, and E. A. Bayer, unpublished results.
4The abbreviations used are: