The fundamental and emergent rules make possible the encoding of funnel-shaped energy landscapes. We can sculpt energy landscapes to be strongly funnelled by designing secondary structure patterns that favour the tertiary motifs present in the desired topology and disfavor non-native motifs. The desired structure is then further stabilized by using RosettaDesign8
to obtain sequences with favourable non-local interactions such as complementary hydrophobic core packing. The latter step involves purely positive design because the energy of the desired structure is optimized without regard to competing states, whereas the design of sequences that favour specific secondary structure patterns also has elements of negative design because non-native conformations are disfavoured by the local structural preferences of the protein backbone captured by the rules.
We tested this approach by attempting to design strongly funneled landscapes for five different folds ( and Supplementary Discussion 2
). The first step is to choose secondary structure lengths that favour the desired fold and disfavour alternatives. We illustrate how to choose the secondary structure lengths that favour a desired topology with Fold-I, the classic ferredoxin-like fold (, leftmost fold). The secondary structure elements are, in order, β1
. To assign the lengths of the loops and strands, we apply the emergent rules to the αββ- and ββα-triples and the βα- and αβ-rules to the two βαβ-units: (β1
. Reading directly from and Supplementary Fig. 1
, we find that for strand length 7 the ideal loop lengths between successive secondary structure elements are 3, 2, 2, 3 and 2 (from the amino to the carboxy terminus). To assign the lengths of the helices, we find from Supplementary Fig. 10
that for strand length 7 the optimal helix length is 18. We can apply the same procedure to each of the other folds to obtain the corresponding ideal secondary structure lengths (): for Folds-II, -IV and -V, we treat (αβα) as (αβ)P
and apply the corresponding two fundamental rules.
Derivation of secondary structure lengths from the rules for five protein topologies
To build tertiary backbone structures from the two-dimensional representations of protein folds depicted in , we carry out multiple independent Rosetta folding simulations using the secondary structure strings obtained from the rules. For Folds-I, -III and -V, a significant fraction of trajectories produced the desired topology because the secondary structure lengths were chosen specifically to encode it. Folds-II and -IV are not distinguished by the rules, and to resolve this ambiguity we varied the secondary structure lengths and used folding simulations to select lengths strongly favouring one or the other fold. For larger proteins, such degeneracies are likely to increase and additional rules may need to be identified to resolve them. Within the population of structures with the desired topology, there is still considerable variety in the distances and angles between the secondary structure elements, the loop conformations and the twist of the β-sheet. This variation is important because it provides a range of starting points for designing sequence-structure pairs with very low energy as described in the next paragraph.
Up to this point, specific sequence information has not been introduced; the representations are of the protein backbone alone. For each backbone in the ensemble, we then use Monte Carlo simulated annealing to identify amino acids and side-chain conformations that give rise to very low-energy structures. This is carried out using fixed-backbone RosettaDesign8
calculations followed by relaxation of the structure of the backbone and the side chains in the Rosetta all-atom energy function28
. These sequence design and structure refinement calculations are then iterated8
to generate a tightly packed hydrophobic core with a packing density approaching that of close-packed crystals. Larger hydrophobic amino acids (Ile, Leu and Phe) are favoured in the core to create a strong driving force for folding29
. Negative design is applied to the edge β-strands and the protein surface to destabilize non-native conformations and disfavour oligomerization: inward-pointing polar residues are introduced in the strands and hydrophobic patches are removed from the surface. The designed structures are then filtered according to energy, packing (as assessed by RosettaHoles30
) and the local sequence–structure compatibility (Methods) to disfavour other structures (this last criterion is effectively a negative design step). Finally, for each sequence passing these filters, 200,000–400,000 independent Rosetta ab initio
structure prediction simulations starting from an extended chain22
are performed to map out the folding energy landscapes. Roughly 10% of the designs have funnel-shaped energy landscapes leading into the designed structures (; compare with Supplementary Fig. 11
) and these are selected for experimental characterization. Proteins designed with this protocol (summarized in Supplementary Fig. 12
) by construction have consistent local and non-local interactions. Notably, the only globular protein designed de novo
before this work, Top7 (ref. 8
), also satisfies our rules and has consistent local and non-local interactions.
Characterization of design for each of the five folds