Collagen is the most abundant protein in higher animals, accounting for approximately one-third of total protein mass. Individual collagen chains trimerize into triple-helices, which further assemble into higher order structures such as long fibers and mesh-like networks (1
). These structures provide tensile strength and flexibility to tissues. Collagens also play important functional roles in mediating cell polarity, initiating thrombosis and modulating tumor metastasis (1
). There are twenty-eight known collagen types in humans (2
), many with multiple subtypes. Subtypes are often co-expressed to assemble in heterotrimeric triple-helices. The most abundant collagen, Type I, is composed of two α1
(I) and one α2
(I) chains. Type IV collagen, a primary component of basement membranes, can exist as 2α1
(IV) or α5
(IV) heterotrimers (3
). Stoichiometry is controlled at multiple levels, from gene expression to protein-protein interactions. Natural collagens are over a thousand amino acids in length and contain both globular and triple-helical domains. As such, model peptide systems have been essential tools in exploring the molecular basis for stability, specificity and stoichiometry of collagen assembly. We use computational protein design to study this important class of proteins.
At one extreme, the design of collagens is quite easy; fibril forming domains are defined by a canonical Gly-X-Y triplet repeat. These repeats can extend for a thousand amino acids, resulting in triple-helices around 300 nm long. The X and Y positions are frequently proline and (4R)-hydroxyproline (abbreviated as Hyp or O) respectively. At the other extreme, understanding the thermodynamics of collagen assembly towards atomic resolution computational design applications is a challenging task. Unlike globular proteins, the fibrillar regions of collagens do not have a hydrophobic core. Instead, the triple-helix structure is mediated by a network of inter-chain backbone hydrogen bonds (4
). Sidechains of non-glycine positions project into solvent, where the energetic contributions of these groups to folding and stability are highly dependent on interactions with water (6
). High-resolution crystal structures of collagen peptides show an extended, structured hydration network surrounding the triple-helix. Modeling solvent contributions to protein folding and structure has always been a major challenge in computational methods, trying to strike a balance between the efficiency of continuum solvation and the accuracy of explicit water models (8
). It is also unclear to what extent hydration serves to stabilize the collagen triple-helix. Raines and colleagues have demonstrated that unique stereoelectronic effects such as a predisposition of the Cγ-exo ring pucker in hydroxyproline favors main-chain bond angles associated with the triple-helix (10
). Such forces are essential for collagen folding but not commonly included in molecular mechanics or design potentials. Additionally, the folding of collagen is slow, and does not appear to be two-state (12
), complicating the estimation of sequence contributions to the energies of native and unfolded states. Computational design of collagen peptides will provide a powerful approach for exploring the complexities of fibrous protein folding.
Current understanding of collagen structure and folding has come in large part from extensive biophysical characterization of model peptide systems. Natural collagens have a higher than normal frequency of acidic and basic residues in triple-helical domains, suggesting that electrostatic interactions play an important role in stability and the specificity of chain-chain recognition during folding (14
). Seminal host-guest model collagen peptide studies by Brodsky and colleagues established that favorable interchain charge-pair interactions can increase thermal stability (17
). Most model peptide studies have focused on homotrimers, although the introduction of disulfides has proved useful in stabilizing engineered heterotrimers (20
). Complementary pairing of electrostatic interactions can also drive the formation of highly stable heterotrimeric triple-helices (22
Designing hetero-oligomeric assemblies provide unique challenges to protein design. An effective design will optimize the stability of the target fold, while disfavoring competing states. These two components are referred to as ‘positive’ and ‘negative’ design (25
). Characterizing the heterotrimer energy landscape with peptides A, B and C, requires consideration of the target state, ABC, and the twenty-six competing states: AAA, BBB, CCC, AAB, BCA, etc. Positive design targets the ABC state, by maximizing its stability based on charge pair interactions between the three peptides. Additionally, an explicit negative design component selects for sequences of A, B and C with unfavorable interaction energies when computed in alternate, competing oligomerization states. Since each round of sequence selection requires calculations over multiple states rather than one, a simple, discrete model of collagen heterotrimer stability is developed that allows rapid computation.
In this study, we present a computational design protocol that concurrently optimizes target stability and specificity over competing states, towards the design of three peptides that assemble as an ABC heterotrimer. This project is novel in scope with the selection of collagen as a target and the explicit incorporation of positive and negative design. We characterize the first-generation of peptides using this approach and analyze design successes and failures toward improving the design protocol and understanding the role of electrostatics in collagen folding.