|Home | About | Journals | Submit | Contact Us | Français|
Stability of the collagen triple helix is largely governed by its imino acid content, namely the occurrence of proline and 4R-hydroxyproline at the X and Y positions respectively of the periodic (Gly-X-Y)n sequence. Although other amino acids at these positions reduce stability of the triple helix, this can be partially compensated by introducing intermolecular side chain salt bridges. This approach was previously used to design an abc-type heterotrimer composed of one basic, one acidic and one neutral imino acid rich chain (Gauba & Hartgerink, 2007). In this study, an abc-type heterotrimer was designed to be the most stable species using a sequence recombination strategy that preserved both the amino acid composition and the network of interchain salt-bridges of the original design. The target heterotrimer had the highest Tm of 50°C, 7°C greater than the next most stable species. Stability of the heterotrimer decreased with increasing ionic strength, consistent with the role of intermolecular salt bridges in promoting stability. Quantitative meta-analysis of these results and published stability measurements on closely related peptides was used to discriminate the contributions of backbone propensity and side chain electrostatics to collagen stability.
Protein design tests our fundamental understanding of the forces that govern molecular stability and structure. Collagen presents many unique challenges as a design target. In most proteins, an extensive hydrophobic core stabilizes the folded structure, but the collagen triple-helix lacks buried hydrophobic interactions. Instead a network of backbone hydrogen bonds promotes assembly. Amino acid side chains are extensively solvated and are flexible due to minimal tertiary structure constraints. This makes it challenging to model and calculate their contributions to protein energetics. Fibril forming regions of natural collagens can extend over a thousand amino acids, making them difficult to express and characterize. As such, much of our understanding of collagen folding has come from work on short collagen mimetic peptides. These have also served as a powerful platform for developing molecular design rules to manipulate stability and specificity of assembly.
Three polypeptide chains combine to form the collagen triple-helix, each comprised of a canonical (Gly-X-Y)n repeat. Nearly all amino acids are tolerated at both X and Y positions, but the glycine is essential for folding. Glycine mutations significantly reduce stability in model systems1,2 and result in collagen-mediated pathologies.3 Natural collagens are rich in imino acids with proline at position X and hydroxyproline at position Y, which serve to stabilize the triple-helix fold.4,5 The most stable collagen sequence using biogenic amino acids consist solely of repeating Gly-Pro-Hyp triplets (Hyp or O = 4R-hydroxyproline), and one can roughly estimate the stability of collagen based on the imino acid content of the sequence alone.6
Following Pro and Hyp, the next most over-represented amino acids in natural collagens are acidic and basic residues.7,8 Often, these form networks of favorable interchain charge pairs that contribute to triple-helix stability.9–11 Due to the regular structure of collagen, it is possible to infer the presence of side chain charge pairs from sequence. Amino acids at the Y position on one chain are in close proximity to X and X′ positions on the adjacent chain. Side chain salt bridges can confer significant stability, in some cases rivaling the contributions of imino acids.12
These two features of collagen: networks of side chain electrostatic interactions, and high imino acid content, were designed into a series of collagen peptides to promote formation of an abc-type heterotrimer.13 Three peptides R10, E10 and POG were studied with sequences (Pro-Arg-Gly)10, (Glu-Hyp-Gly)10 and (Pro-Hyp-Gly)10 respectively. The charged amino acids formed a network of interchain salt-bridges; ideally, a maximum of nineteen ion pairs could exist in the heterotrimer (Figure 1A). The neutral POG chain did not contribute significant side chain interactions, but promoted stability through its rich imino acid content. The E10:R10:POG heterotrimer had a stability of 54°C, superseded by the POG homotrimer (Tm = 67.5°C). Upon thermal denaturation, a single transition attributed to an abc-type heterotrimer was observed if the peptides were first heated and annealed.
We hypothesized a thermodynamically optimal abc-type species could be obtained starting from the original peptide sequences. The E10, R10 and POG peptides sequences were recombined as to maintain key physiochemical elements. The recombination would specifically destabilize competing states (aaa, aab, etc), leaving the abc-type heterotrimer as the most stable species. This strategy is referred to as negative design. Positive design features the introduction of amino acid substitutions that favor stability of the target conformation, whereas negative design promotes target formation by disfavoring competing states. The redesign was successful, allowing us to develop a quantitative model relating the contributions of backbone propensity and side chain electrostatics to triple-helix stability.
Peptides R, Z, and E were synthesized at the Tufts University Core Facility (http://tufts.org) using solid-phase synthesis FMOC chemistry.
The peptides were purified by reverse-phase HPLC to greater than 95% purity. The identities of products were verified by mass spectrometry (Figure S1). Peptide concentrations were determined by measuring absorbance at 214 nm, using an extinction coefficient of 2200 M−1 c−1 per peptide bond.
Circular dichroism (CD) experiments were performed on AVIV Model 400 CD Spectrophotometer (AVIV Biomedical Inc., Lakewood, NJ), with optically matched 0.1 cm path length quartz cuvettes (model 110-OS; Hellma USA). Peptide concentrations of 0.2 mM were prepared in 10 mM phosphate buffer at pH 7.0. Two annealing protocols were used to assess the role of assembly kinetics in modulating the distribution of species. For slow cooling, samples were incubated at 80°C for 45 minutes, gradually cooled to 50°C for 30 min, then 25°C for 30 minutes, and finally 4°C for 48 hours. For rapid cooling, samples were immediately moved into 4°C conditions after incubating at 80°C for 45 minutes.
Wavelength scans were conducted from 190 to 260 nm at 0 °C (averaging time 1.0 s). Values were reported as molar ellipticity corrected for concentration, sequence length and path length. Thermal denaturation was monitored using ellipticity at 225 nm. Temperature was raised from 0 to 80 °C in 0.3 °C steps with an equilibration time of 2 min at each step. Savitzky-Golay smoothing of the denaturation profiles was carried out over a span of eleven points using a third-order polynomial.14 The Tm’s of melting transitions were assigned to extrema of the first derivative of the denaturation profile. Reported values and standard error are from triplicate thermal denaturation experiments. To assess the effect of ionic strength, peptides were allowed to fold in the presence of salt at a series of concentrations: 0.0, 0.01, 0.1, 0.5 and 1 M NaCl.
The original sequences13 were bisected into 15-residue blocks and recombined so that charged amino acids and imino acids were redistributed among the three peptides:
The total amino acid composition across the three peptides was preserved and eighteen salt bridge side chain pairs were possible, one less than the original design (Figure 1B). By eliminating the high imino-acid content POG peptide, it was expected that all competing species would be less stable than an R:Z:E heterotrimer.
Peptides E, R and Z were synthesized, purified and reconstituted in 10 mM phosphate buffer, pH 7.0 as described in the Methods. E and R formed structured homotrimers in solution, showing the characteristic circular dichroism (CD) spectrum of a triple-helix with a positive band at 225 nm (Figure 2). The structure of Z was difficult to characterize due to its tendency to aggregate in the absence of R or E. Aggregation of zwitterionic collagen peptides has been observed for many systems15–18, purportedly due to complementary inter-helical electrostatic interactions driving higher order assembly. Consistent with this, transmission electron microscopy of Z aggregates revealed nanometer scale, fibrillar structures (Figure S2). In the presence of R and/or E, Z no longer precipitated and was incorporated into soluble triple-helical heterotrimers.
A stable triple-helix was formed by combining R, Z and E. Stability was assessed by thermal denaturation monitored with CD for nine mixture stoichiometries of the three peptides: 3R, 3E, 2E:R, E:2R, 2E:Z, E:2Z, 2R:Z, R:2Z and R:Z:E (Figure 3, Table I). Melting temperatures were determined by finding extrema of dθ/dT in the first derivative plot of the denaturation profile (θ = ellipticity). Combining R, Z and E in a 1:1:1 ratio produced a major transition at 50°C, the highest observed in any of the mixtures. Only minor differences in the stability of R:Z:E mixtures were noted when using a rapid or slow annealing schedule, suggesting heterotrimer formation was not sensitive to the annealing protocol (Figure 4). A mixture of species was present as evidenced by the broad unfolding transition at lower temperatures. CD cannot discriminate between single and mixed-composition species or specify the register of the triple-helix (RZE versus REZ, ERZ, EZR, ZER, ZRE) 15,19 – but we can conclude that the optimally stable species requires the presence of all three peptides.
Binary composition heterotrimers were also formed, but these were less stable than R:Z:E. Combining R and Z resulted in two discernable transitions, one with the same stability as an R homotrimer and a less stable species that was more prevalent at higher concentrations of Z. Combining E and Z also resulted in two transitions, one with the same stability as an E homotrimer. The two transitions in both R:Z and E:Z mixtures may be explained by the presence of R:2Z + 2R:Z and E:2Z + 2E:Z, although it is not possible to assign transitions to a specific species.
The relative populations of the two species in R:Z or E:Z mixtures depended on the annealing protocol. Surprisingly, rapid annealing of the R:Z mixture favored a more uniform, stable transition than the slow annealing protocol. This is counter to what is commonly observed in nucleic acid annealing, where rapid annealing can result in a heterogeneous mixture of species. Although the mechanism for this effect is not clear, it is often observed that the folding/unfolding of collagen peptides can be slow – on the order of minutes to hours – and variations on the annealing protocol in at this time scale might be expected to have dramatic effects such as this.
The R:Z:E heterotrimer was designed such that an extensive network of salt bridges would stabilize the target oligomer. To assess the salt bridge contributions to stability, ionic strength was varied using a series of salt concentrations from 0.0 to 1.0 M NaCl. Triple helical structure was observed under all conditions, but the Tm decreased by 9°C between the lowest and highest ionic strength conditions (Figure 5, Table II).
Stability of homotrimers also varied based ionic strength. E homotrimers increased in stability by 10°C upon addition of salt, suggesting that repulsive interactions between adjacent amino acids of like charge disfavored triple-helix formation. The effect was less pronounced for R homotrimers, where stability first decreased and then increased with the addition of salt. This may have been due to an initial loss of favorable guanadyl side chain to backbone carbonyl interactions,20 followed by screening of side chain repulsions at higher ionic strengths.
Success of the redesigned heterotrimer was predicated on balancing the distribution of backbone stabilizing amino acids and maintaining key side chain ion pairs specific to the target state. Straightforward sequence-structure mapping in the collagen fold has motivated sequence-derived stability prediction models based on the assumption that backbone and side chain forces additively contributed to overall stability.10 Together, the original and redesigned abc-type heterotrimer peptides provide a unique data set for developing a model that specifically addresses the relative importance of these interactions in this set of peptides:
NPOG, NEOG, NPRG were POG, EOG and PRG triplets counts, respectively. NR-E was the total number Arg-Glu interchain interactions at Y-X and Y-X′ positions. The value used represented the maximal number of favorable interactions, assuming an optimal registry as depicted in Figure 1. NR-R and NE-E were the number of interchain and intrachain Arg-Arg or Glu-Glu unfavorable, spatially adjacent interactions using a model three-dimensional structure (Figure S3). Although the strength of electrostatic interactions for solvent exposed residues over 10–15 Å was expected to be small, it has been observed in globular proteins that multiple weak interactions can have significant effects on structure and stability.21 It was further assumed that under 1.0 M NaCl conditions, both favorable and unfavorable electrostatic interactions were completely screened.
Previous host-guest collagen peptide studies concluded Tm correlates linearly with free energy of folding over a wide temperature range, allowing us to evaluate the weights directly using experimental melting temperatures.11 The model was fit to experimental stabilities for twelve peptide compositions (Table III). All peptides were the same length (NPOG + NEOG + NPRG = 30 triplets), with capped termini, composed solely of POG, EOG and PRG triplets. Only homotrimers and the abc-type heterotrimer species where stoichiometry could be reasonably assigned were used. With these assumptions a final model was determined using the nonlinear least-squares fitting module Solver in Microsoft Excel. Standard error of parameters was determined using a jackknife leave-one-out method:
Weights describe the change in Tm per interaction – i.e. for a (POG)10 homotrimer, the total stability is calculated as 2.1°C per triplet × 30 triplets ≈ 63°C. A reasonable correlation between observed and calculated stabilities was obtained with the full model (R2 = 0.81) (Figure 6).
Removing the two repulsive terms from the model reduced the correlation (R2 = 0.67), and if no pairwise terms were included, only a poor correlation was observed (R2 = 0.44). Our previous scoring function that did not discriminate between the backbone propensities of Arg and Glu15 also gave a weak correlation of R2 = 0.44.
The rank stability of triplets: POG > PRG > EOG, was consistent with previously determined amino acid preferences for the triple helix.22 Based on this model, a single Pro→Glu substitution would change stability by α1–α3 = −1.1°C. In a homotrimer, this would amount to a ΔTm of −4.3°C, comparable to prior estimates of −4.4°C.10 In contrast, a single Hyp→Arg substitution was predicted to reduce stability by 0.7°C, whereas the reported effect of single substitutions in homotrimers was negligible.10 Only in cases of multiple Hyp→Arg substitutions were ΔTm values of up to −0.8°C/substitution observed.23 Non-additive contributions of multiple substitutions for other amino acid types has also been observed in both homotrimers24 and heterotrimers.25
Inclusion of favorable pairwise interactions between charged side chains improves the correlation, and explains the effect of high ionic strength on the stability of the R:Z:E mixture. Although it is assumed here that Y-X and Y-X′ interactions are energetically equivalent, recent structural studies indicate Y-X′ interactions contribute more to stability, depending on the residue identities.26 As such, assuming an optimal number of favorable interactions in R:Z:E or R10:E10:POG may result in underestimating the contribution of individual salt bridges to overall stability.
The R:Z:E mixture can assume six registries assuming optimal interstrand hydrogen bonding: RZE, ZER, ERZ, REZ, EZR and ZRE. ZER and ERZ have the potential to form sixteen favorable charge pairs, two less than RZE, which corresponds to a modest 1.6°C difference in melting temperature. We do not expect the scoring function to reliably distinguish which of these registries is formed. In contrast, EZR ZRE and REZ are not able to form any charge pairs and their calculated Tm’s are 38°C, significantly lower than the observed transition. It is unlikely that these states are significantly populated in the R:Z:E mixture.
The collagen heterotrimer has become a useful target for developing positive and negative protein design methods. Our redesign of an abc-type heterotrimers by sequence recombination used a purely negative design strategy to enhance the specificity of the target state without significantly compromising its stability. Modulating stability by redistributing imino-acids among the three sequences was recently used to generate a single composition collagen ABC heterotrimer,27 indicating that it may serve as a general strategy for tuning the oligomerization energy landscape.
Quantitative analysis and molecular modeling coupled with biophysical and structural characterization are improving our understanding of the molecular forces that drive collagen folding and stability. Although the model developed here is limited to specific amino acid compositions, peptide length and types of side chain interactions, it highlights the relative importance of backbone propensities and favorable and unfavorable side chain electrostatic interactions to collagen stability.
Understanding how sequence recombination alters stability and specificity should provide insight into the evolution of heterospecific natural collagens. Fibrous proteins with periodic structure presents unique challenges to traditional phylogenetic analysis methods due to the inherent repetition of low information content sequence motifs.28 The exon organization of vertebrate collagens shows evidence of gene duplication and exon shuffling.29–31 Synthetic proteins such as these can be used to model how extant collagens may have evolved from the repetition and shuffling of simpler structural domains.
This work was supported by the NIH DP2 OD006478-01 and NSF DMR-0907273.