|Home | About | Journals | Submit | Contact Us | Français|
Protein physicochemical properties must undergo complex changes during evolution, as a response to modifications in the organism environment, the result of the proteins taking up new roles or because of the need to cope with the evolution of molecular interacting partners. Recent work has emphasized the role of stability and stability–function trade-offs in these protein adaptation processes. In the present study, on the other hand, we report that combinations of a few conservative, high-frequency-of-fixation mutations in the thioredoxin molecule lead to largely independent changes in both stability and the diversity of catalytic mechanisms, as revealed by single-molecule atomic force spectroscopy. Furthermore, the changes found are evolutionarily significant, as they combine typically hyperthermophilic stability enhancements with modulations in function that span the ranges defined by the quite different catalytic patterns of thioredoxins from bacterial and eukaryotic origin. These results suggest that evolutionary protein adaptation may use, in some cases at least, the potential of conservative mutations to originate a multiplicity of evolutionarily allowed mutational paths leading to a variety of protein modulation patterns. In addition the results support the feasibility of using evolutionary information to achieve protein multi-feature optimization, an important biotechnological goal.
Proteins must undergo adaptive changes during evolution when, for instance, the environment surrounding the organism is altered or when they are recruited for new roles [1,2]. Although these changes must involve modulation in several protein properties, recent work has mostly emphasized the role of protein stability in molecular evolution [3–6]. A common argument is that most mutations affect stability, whereas only a few are likely to affect function. Furthermore, experimental studies show that most mutations are destabilizing and, consequently, accumulation of a few mutations may compromise the so-called ‘protein fitness’, due to the concomitant sharp decrease in stability. In addition, it is often assumed that the evolution of biological function is ‘limited’ or ‘constrained’ by the destabilizing effects of mutations, as stability and function are generally presumed to trade-off. As a well-known example, mutations conferring the bacterial TEM-1 lactamase with resistance against third-generation antibiotics were found to be destabilizing . Molecular evolution that is determined by trade-off and sign-epistasis  effects may be expected to be constrained to follow a few mutational paths (as mutations are bound to occur in a rather specific temporal order to avoid deleterious intermediate combinations). In fact, the possibility of ‘re-winding’ the molecular tape of life has been suggested .
In the present study, we explore a point of view that differs from that often found in recent literature and summarized above. We reason that, as adaptations to new situations are common during evolution, many proteins are poised to change their properties efficiently, at least within certain evolutionarily relevant ranges; this should be particularly the case for proteins involved in several molecular tasks, which may have to cope with evolutionary changes in many interaction partners. We thus propose a mechanism for efficient adaptation based on a set of mutations with several features. First, that the mutations in the set have a high frequency of fixation during evolution. That is, they belong to the class of mutations that would be loosely described in various contexts as ‘conservative’, ‘non-disruptive’, ‘neutral’, ‘nearly-neutral’ or ‘quiet’. We use the term conservative in the present paper, but we specifically refer to mutations with non-negative coefficients in substitution matrices (such as the Dayhoff et al.  matrix, PAM250) and, consequently, with a high frequency of fixation during evolution. Secondly, that both stability-related and function-related properties are modulated by the mutations in the set, with no strong bias for stability compared with function. Thirdly, that the effects of the several mutations in the set are roughly independent and therefore strong trade-off and sign-epistasis effects do not occur. Fourthly, and as a result, a diversity of mutational pathways leading to a variety of patterns of complex modulation become available to Darwinian evolution thus leading to efficient protein adaptation within some evolutionarily significant range of protein property values.
In the present study, we provide experimental support for the above proposal by showing that combinations of a few conservative mutations in the thioredoxin molecule lead to large, independent and evolutionarily meaningful changes in stability and the complex patterns of catalysis revealed by single-molecule AFM (atomic force microscopy). Actually, thioredoxin provides an excellent model system to investigate the issues we have raised above for the three following reasons.
First, thioredoxins  are present in all known organisms and, consequently, these enzymes exist for organisms that thrive in widely different environments. Temperature environment, for instance, is particularly relevant and we may expect thioredoxins from psychrophilic, mesophilic, thermophilic and hyperthermophilic organisms to show quite different thermal stabilities. Secondly, thioredoxins catalyse the reduction of target disulfide bonds, regulating a multitude of cellular processes . A previous proteomic analysis  has identified 80 proteins associated with thioredoxin and has shown it is involved in at least 26 cellular processes in Escherichia coli; furthermore, additional functions and protein targets for thioredoxin have been reported in eukaryotes . Clearly, a high potential for adaptation is to be expected in the case of thioredoxin, a protein which must cope with evolutionary changes in a multitude of interaction partners and the associated functional roles. Thirdly, recent single-molecule work [14,15] has indeed revealed a diversity of reduction mechanisms in thioredoxin with evolutionarily differentiated patterns of catalysis showing a well-defined correlation with the domains of life. These studies indicate the evolutionary range of variation of the chemistry of thioredoxin catalysis thus providing a clear reference framework for the assessment of the evolutionary significance of mutation-induced modulations.
Single-molecule AFM allows the application of a calibrated force to a disulfide-bonded substrate making it possible to study the catalytic mechanisms with sub-Ångström precision. When the substrate is stretched at low forces, all thioredoxins share a Michaelis–Menten-type reduction mechanism. On the other hand, at high forces there are clear-cut differences between thioredoxins of eukaryotic and bacterial origin. In the former case, disulfide-bond reduction occurs through an SET (single electron transfer) reaction, whereas thioredoxins of bacterial origin exhibit, in addition, a simple nucleophilic substitution (SN2), thus showing a rate increase at high force that is absent in eukaryotic thioredoxins. It is relevant that the high-force SN2 mechanism is equivalent to that observed in disulfide-bond reduction with chemicals, such as dithiothreitol, glutathione or cysteine [16,17]. The presence of this simple chemical SN2 mechanism in bacteria may be related to their ability to survive in extreme conditions . On the other hand, the simple chemical SN2 mechanism would appear detrimental in eukaryotic cells and thioredoxins from eukaryotic origin seem to have been naturally selected to depress this pathway of reduction. A structural interpretation in terms of the narrowing of the substrate-binding groove in eukaryotic thioredoxins has been advanced .
Further details are provided in the Supplementary Experimental section at http://www.BiochemJ.org/bj/429/bj4290243add.htm.
The sequence alignment used has been described in detail previously [18,19]. Briefly, BLAST2 (http://blast.wust1.edu) was used to search the sequence of E. coli thioredoxin with the PDB code 2TRX as the query in the UniProt database with default search options. Resulting sequences were aligned using the Smith–Waterman algorithm. Those belonging to proteobacteria and with a sequence identity with the query higher than 0.25 were used to calculate the ratios of frequencies of occurrence given in Figure 1(A). Information from  and  was used to examine the temperature range for growth and the optimal growth temperature for the 100 proteobacteria for the sequences included in the alignment. All these micro-organisms are mesophiles, with the only exception of two psychrophiles. The library of thioredoxin variant sequences was constructed by using gene assembly mutagenesis . Thermal stability was determined by differential scanning calorimetry [18,23]. Single-molecule AFM experiments were carried out as described previously [14,15]. Very briefly, we used as the protein substrate a polypeptide made of eight repeats of the I27 domain of human cardiac titin with engineered cysteine residues. A custom-built AFM controlled by an analogue proportional-integral-derivative feedback system  was employed to follow the reduction, catalysed by thioredoxin, of individual disulfide bonds under a stretching force applied to the substrate.
To investigate how evolutionarily relevant changes in thioredoxin properties may be achieved, we have determined a number of high-frequency-of-fixation mutations using an alignment of 100 sequences derived  from a database BLAST search with the E. coli thioredoxin sequence as the query. It is very important to note from the outset, that the alignment used contains exclusively sequences of proteins from proteobacteria. All proteins in the alignment are thus of bacterial origin and, furthermore, they happen to belong to mesophilic organisms or (in a few cases) to psychrophilic organisms. That is, thioredoxins from thermophilic or hyperthermophilic organisms are not represented in the alignment used. Despite this, the three highest frequency mutations in the alignment led to a stability enhancement, an increase in the denaturation temperature of approx. 10 °C with respect to the wild-type E. coli protein, as shown in previous work . The corresponding triple-mutant variant (referred to as V3) was the starting point of the analysis in the present study. Using V3 as a background, we have constructed a combinatorial library using the next eight highest frequency mutations (Figure 1A). This library contains 256 variants from which 23 were randomly selected and subjected to single-molecule determination of the reduction rates at 75 pN (a low force at which the Michaelis–Menten mechanism is the major contribution) and 500 pN (a high force which reveals the contribution of the simple chemical SN2 mechanism). The surprising result (Figure 1B) is that the library variants showed a considerable variability in the high-force and low-force rates, actually spanning the range defined by the quite different catalysis patterns of thioredoxins from bacterial and eukaryotic origin. We also assessed the thermodynamic stability (equilibrium denaturation temperature as measured by scanning calorimetry) for the library variants; again a large variability was found although a trend to stability enhancement is clearly apparent (Figures 2A and and2B).2B). This trend may reflect a threshold evolutionary limit for stability, which would lead to statistical preferences for stabilizing mutations in the alignment . Nevertheless, the important point to note is that a small number of mutations determine simultaneous modulations in stability and complex patterns of catalysis. It must be also be emphasized that these modulations are observed in variants extracted from a combinatorial library, a procedure that, in fact, reproduces the situations found during thioredoxin evolution: note that the mutations we use are conservative, high-frequency ones and, as a result, they appear combined very often during evolution. In fact, the distribution in the alignment used combinations of the mutations close to the binomial distribution (Figure 2C).
The modulations found in stability and catalysis appear to be largely independent of each other, as is qualitatively shown by the fact that the plots of the properties for the library variants (Figures 1B, B,2A2A and and2B)2B) are essentially scattergrams that show little correlation, whereas non-independence between two properties (because of the existence of trade-offs or because of the two properties reflecting a common underlying feature) would result in correlated plots. Therefore the absence of clear correlations in the experimental property plots (Figures 1B, B,2A2A and and2B)2B) revealed the important result that several properties can be modulated to a significant extent in an independent manner by suitable combinations of conservative mutations.
To further quantify the independence between the studied properties we have carried out a PCA (principal component analysis), a mathematical procedure to identify the directions along which sample variation is maximal and to reduce the dimensionality of data sets . In a first step, we applied PCA to the three two-property sets that can be derived from our results (i.e. low-force rate compared with denaturation temperature, high-force rate compared with temperature and high-force rate compared with low-force rate). For two properties that are strongly correlated, the PCA should reveal one major component together with a very minor one, indicating that the dimensionality of the set can be reduced to one. However, for our three two-property sets (Figures 3A–3C), the contributions of the first and second component both appear significant, supporting very weak correlations and therefore efficient independent modulation. In a second step, we performed a PCA analysis with the whole three-property set and Figure 3(D) shows the percentage of data variance explained by the resulting three principal components. Whereas the first and second components dominate, even the third one makes a significant contribution (approx. 10% of the variance).
The fact that stability-related and catalysis-related properties can be modified in an independent manner implies that different patterns of protein modulation are, in principle, possible. This is evident in the wide variety of property combinations for the studied library variants shown in Figures 1(B), (B),2(A)2(A) and and2(B).2(B). Clearly, the set of mutations studied has a high potential for protein multi-feature modulation, as is dramatically illustrated by the combination of stability enhancement and eukaryotic signature of catalysis we describe below.
The library variants were prepared with a His6 tag for ease of purification. Previous studies have shown a negligible effect of His6 tags on reduction rates measured by single-molecule AFM and the results reported in the present study indicated only a small effect on stability. However, we still carried out a detailed analysis of a particularly interesting library variant, referred to as trx*, prepared without the tag. trx* is a variant of E. coli thioredoxin with the following mutations: A22P, I23V and P68A (i.e. the V3 background) plus D10A, Q50A, G74S, E85Q and A87V (see the Supplementary Results and discussion section at http://www.BiochemJ.org/bj/429/bj4290243add.htm for further information). A comparison between trx* and wild-type thioredoxin from E. coli (without His6-tags in both cases) is shown in Figure 4. trx* had a denaturation temperature of 108 °C (approx. 20 °C higher than wild-type E. coli thioredoxin) and a very slow unfolding rate (approx. 15000-fold slower than wild-type E. coli thioredoxin). Furthermore, in single-molecule experiments, it showed an enhanced low-force reduction rate and a diminished high-force rate, indicating a depressed simple SN2 mechanism. In fact, the pattern of catalysis for trx* approaches that of a eukaryotic thioredoxin (see Figure 1B). The combination of high stability and eukaryotic pattern of catalysis in trx* is a particularly suggestive one, as, in most cases, eukaryotic organisms are not thermophiles .
The fact that a certain combination of mutations leads to a pattern of protein properties that enhances fitness does not necessarily imply that such a pattern is accessible through Darwinian evolution. Evolutionary accessibility requires that a mutational path exists that leads to the desired combination without involving deleterious intermediate combinations. However, the protein multi-property modulations described in the present study are based on conservative mutations, which are not likely to show strong trade-off or sign-epistasis effects. This is clearly supported by three points. First, that the analysis of the combinatorial library data in terms of an ‘independent mutation effect model’ yields small values for the mutation effects on activity and catalysis (Supplementary Figure S5 at http://www.BiochemJ.org/bj/429/bj4290243add.htm). Secondly, the fact that strong trade-off effects would cause residue co-evolution (for instance, if a mutation improves function while it strongly decreases stability it will occur during evolution coupled with mutations that strongly increase stability). However, we have carried out a co-variance analysis of the alignment used (Supplementary Figure S3 at http://www.BiochemJ.org/bj/429/bj4290243add.htm) which indicates that co-evolution between the positions included in our library is not significant. Thirdly, that the principal component analyses shown in Figure 3 do not reveal strong correlations. Accordingly, we may expect that, for each specific modulation pattern, a significant number of evolutionarily allowed mutational paths exist. This idea is illustrated by the simple calculation we describe below.
Assume that, over some evolutionary time-span, changes in the organism environment and biological function are taking place in such a way that increased fitness is brought about by a thioredoxin enzyme of high stability, an enhanced rate of catalysis through the ‘specific’ Michaelis–Menten mechanism and depressed rate of catalysis through the ‘chemical-like’ SN2 type of mechanism. This multi-property modulation is in fact achieved by the five mutations, D10A, Q50A, G74S, E85Q and A87V, present in the trx* variant described in the preceding section. Five mutations can occur in 5!=120 different temporal arrangements. The question is how many of these 120 mutational paths are evolutionarily allowed in the sense that they do not involve deleterious intermediate combinations (i.e. combinations involving strongly decreased fitness). We have assumed a simple linear model for the relationship between protein fitness (f) and the values of the properties under consideration in eqn (1):
where P1, P2 and P3 refer to the values of the denaturation temperature (a measure of stability), the rate of disulfide reduction in single-molecule AFM experiments at low force (a measure of the contribution of the Michaelis–Menten mechanism) and the rate at high force (a measure of the chemical SN2 mechanism). The changes labelled δPi are the value of the property Pi with respect to the ‘starting’ value (the value when none of the five mutations has occurred) and ΔPi refers to the total change, i.e. the final value of the property (when the five mutations have occurred) minus the starting values. Eqn (1) leads to fitness values that are arbitrarily scaled between 0 (no mutations) and 1 (the five mutations are present). More importantly, eqn (1) assumes that fitness increases linearly as the properties approach their final (five-mutation) values and that the three properties under consideration contribute equally to fitness (the same weight in the fitness equation). Eqn (1) allows us to calculate a fitness value for each node in any of the 120 mutational paths, provided that the mutation effects on the properties are known. For the illustrative purposes of this calculation, we have estimated these effects from the least-squares fit of a linear model to the experimental results (see the Supplementary Results and discussion section). The resulting fitness value compared with the ‘number of accumulated mutations’ profiles are shown in Figure 5 for all of the 120 mutational paths. Clearly, most of the paths involve increases in fitness with no deleterious intermediate combinations (i.e. no combinations with decreased fitness with respect to the initial value). Furthermore, even those paths that involve a decrease in fitness at some nodes, do so by a only a comparatively small amount. It is also worth noting again that the five mutations involved in the calculation are derived from an statistical analysis of a sequence alignment; that is, these mutations occur with high frequency during thioredoxin evolution. In the case of the Q50A (which cannot be achieved by a single-base substitution) the mutation likely occurs over an evolutionary time-scale through a glutamate residue as an intermediate amino-acid (Q→E→A); actually the three amino acids involved in this sequence occur with high frequency at position 50 of the alignment used (the number of occurrences are 11 for a glutamine residue, 20 for a glutamate residue and 23 for an alanine residue).
Finally, we must emphasize that the calculation described above and shown in Figure 5 is based on an artificial and hypothetical fitness function and is meant only for illustration. In particular, the actual relationship between fitness in vivo and stability/reduction-rates may be much more complex than that suggested by eqn (1). This notwithstanding, our illustrative calculation does provide some support for the notion that conservative mutations may contribute significantly to evolutionary protein adaptation, as not only may they originate a variety of patterns of multi-property modulation, but, in addition, it is likely that a multiplicity of mutational paths leading to each modulation pattern are accessible to Darwinian evolution.
Overall, we have shown evolutionarily significant modulation in protein properties related to stability and catalysis on the basis of a small number of mutations. The mutations used are fixed frequently during evolution, as deduced not only from alignment analysis (Figure 1A), but also from their non-negative coefficient values in the PAM250 substitution matrix, which range from 0 (D/A, Q/A and A/V substitutions) to 7 (the Y/F substitution). However, the modulations found by combining these mutations take thioredoxin properties clearly away from the values expected for the proteins in the sequence alignment used as a starting point. We thus obtained a eukaryotic signature in catalysis, whereas all sequences in the alignment are of bacterial origin, and we obtain typically thermophilic/hyperthermophilic stability enhancements although the sequences in the alignment belong to mesophilic or psychrophilic organisms. Furthermore, the combinatorial library screened is very small (256 variants), only about 20 variants were analysed and significant modulations are found in essentially all of them. Clearly, in vitro selection plays no role in the results obtained and we conclude that we have simply tapped a fundamental strategy for protein adaptation during evolution. A plausible interpretation of this strategy is summarized below.
Mutations that are fixed often during evolution (the type of mutations that are usually labelled as conservative, neutral, nearly-neutral, quiet, etc.) can actually modulate a variety of relevant protein properties. These mutations are non-disruptive and are not expected to involve strong trade-offs or sign-epistasis effects [7,9]. Different combinations of these high-frequency mutations will thus be available. For instance, combinations of the mutations included in our combinatorial library do occur in the alignment used with a distribution which does not depart dramatically from the binomial one (Figure 2C). The general implication is that natural selection may operate on many different mutational combinations, that a wide variety of patterns of protein modulation become available to Darwinian evolution and that the protein will easily cope with new situations. For instance, the variant of E. coli thioredoxin we have termed trx* displays a thermophilic/hyperthermophilic stability enhancement and a typically eukaryotic signature of catalysis. This combination of features may perhaps occur rarely in natural thioredoxins (as eukaryotic organisms are not usually thermophiles), but still thioredoxins should easily achieve this modulation pattern during evolution provided it brings about an adaptive advantage.
Of course, stability and pattern of catalysis in single-molecule AFM are just the ‘tip of the iceberg’ (as two important biophysical features we can assess in in vitro experiments). Many other features, related to the multitude of roles and interactions of thioredoxin in vivo, may be expected to be modulated by high-frequency mutations. This suggests the possibility of protein multi-feature optimization, an important biotechnological goal , which is not easily achieved with traditional methods. Thus, for instance, consensus approaches based on sequence-alignment statistics have been mostly used to enhance a single protein feature (stability in most cases, with some notable exceptions ) and studies on both stability and catalysis usually emphasize trade-offs (mutations that enhance activity often have deleterious effects on stability). Certainly, multi-feature optimization has been reported in the literature, but it seems to require screening of large libraries or extensive prior knowledge on the effects of the mutations employed . By contrast, the results reported in the present paper suggest that screening of very small combinatorial libraries designed using sequence-alignment information could easily lead to simultaneous modulation/optimization of several relevant protein properties.
David Rodriguez-Larrea prepared the variants from the combinatorial library and designed, performed and analysed the experiments addressed at determining their stability, activity and folding/unfolding kinetics. Raul Perez-Jimenez, Inmaculada Sanchez-Romero and Julio Fernandez designed, performed and analysed the single-molecule AFM experiments. Inmaculada Sanchez-Romero carried out the principal component analyses. Asuncion Delgado-Delgado analysed the sequence alignment used in terms of the biological classification and the optimum growth temperature of the corresponding organisms. Jose Sanchez-Ruiz designed the research and wrote the paper. All authors discussed the manuscript.
This work was supported by the National Institutes of Health [grant numbers HL061228 and HL066030 (to J. M. F.)]; and by federal funds from the Spanish Ministry of Education [grant numbers BIO2009–09562 and CSD2009-00088 (to J.M.S.-R)].