|Home | About | Journals | Submit | Contact Us | Français|
Unlike random heteropolymers, natural proteins fold into unique ordered structures. Understanding how these are encoded in amino-acid sequences is complicated by energetically unfavourable non-ideal features—for example kinked α-helices, bulged β-strands, strained loops and buried polar groups—that arise in proteins from evolutionary selection for biological function or from neutral drift. Here we describe an approach to designing ideal protein structures stabilized by completely consistent local and non-local interactions. The approach is based on a set of rules relating secondary structure patterns to protein tertiary motifs, which make possible the design of funnel-shaped protein folding energy landscapes leading into the target folded state. Guided by these rules, we designed sequences predicted to fold into ideal protein structures consisting of α-helices, β-strands and minimal loops. Designs for five different topologies were found to be monomeric and very stable and to adopt structures in solution nearly identical to the computational models. These results illuminate how the folding funnels of natural proteins arise and provide the foundation for engineering a new generation of functional proteins free from natural evolution.
For proteins to fold, the interactions favouring the native state must collectively outweigh the non-native interactions, resulting in funnel-shaped energy landscapes1–3. However, it is not obvious how the ubiquitous non-covalent interactions that stabilize proteins—van der Waals interactions, hydrogen bonding and hydrophobic packing—can selectively favour the biologically relevant unique native structure over the vastly larger number of non-native conformations. Protein design provides an opportunity to investigate this problem: hypotheses about how unique folded structures are encoded in amino-acid sequences can be evaluated by designing proteins de novo and experimentally assessing how well they fold4–6.
Previous work on protein fold design has focused on stabilizing the desired folded state7–13. However, robustly designing protein structures with funnel-shaped energy landscapes may require not only the stabilization of a unique folded state7–13 (positive design), but also the destabilization of non-native states14–16 (negative design). Protein design methodology has been developed to find sequences that stabilize a desired folded state and destabilize specific non-native states14–20. However, the challenge of disfavouring the vast number of non-native states quite generally remains an unsolved problem.
We hypothesized that funnel-shaped energy landscapes can be robustly generated by requiring that the local interactions between residues close along the linear sequence, which determine protein secondary structure, and the non-local interactions between residues distant along the chain, which stabilize protein tertiary structure, consistently favour the same folded conformation21. We sought principles for designing ‘ideal’ proteins that have this property. To disfavour non-native states systematically by negative design, we focused on the local interactions because non-local interactions vary strongly with even small changes in tertiary structure. We began by investigating the mapping between local interactions favouring specific secondary structure patterns and protein tertiary structure motifs, seeking local structure patterns that strongly favour single tertiary motifs over all others.
We focused on a basis set of tertiary structure motifs consisting of two or three secondary structure elements adjacent in the linear sequence, which make extensive intramotif interactions. We investigated the mapping from secondary structure patterns to these tertiary structure features using a combination of de novo folding calculations with the Rosetta program22 and analyses of naturally occurring protein structures in the Protein Data Bank. Multiple protein folding simulations were carried out for each motif for a range of different lengths of the strands, helices and loops, using a sequence-independent backbone model. For each choice of lengths, we computed the fraction of trajectories that arrived at the desired motif topology. These calculations revealed that the extent of folding to a particular motif is very strongly dependent on the lengths of the secondary structures. Detailed study of these dependencies identified three fundamental rules, which are described in the following section.
The fundamental rules describe the junctions between adjacent secondary structure elements (Fig. 1). There are three distinct junction classes in the αβ-folds we sought to design—ββ, βα and αβ—and three corresponding rules.
Statement of the rules requires the definition of the chirality (L versus R) of a ββ-unit and the orientation (P versus A) of βα- and αβ-units (Fig. 1). The chirality of a ββ-unit is defined on the basis of the orientation of the Cα-to-Cβ vector, , of the strand residue preceding or following the connecting loop: letting u be a vector along the first secondary structure element and v be a vector from the centre of the first secondary structure element to the centre of the second secondary structure element, if (u × v) • (where a cross denotes vector product and a dot denotes scalar product) is positive the unit is right handed (R), and if it is negative the unit is left handed (L) (Fig. 1d). For βα- and αβ-units in which the β-strand is in a β-sheet that the helix packs against, the vectors in the strand are roughly collinear with the vector between the centres of the strand and the helix. We define the orientation of a βα-unit to be parallel (P) if the vector from strand to helix is parallel to the vector of the last residue in the strand, and to be antiparallel (A) if the two are antiparallel (Fig. 1b). The orientation of an αβ-unit is P if the vector of the first residue in the strand is parallel to the vector from helix to strand, and is A if the two are antiparallel (Fig. 1c) (see Supplementary Methods 4 and 5 for details).
The chirality of β-hairpins is determined by the length of the loop between the two strands. Rosetta folding simulations of a peptide with two equal-length β-strands connected by a variable-length loop were carried out on a sequence-independent backbone model (Methods Summary, Methods and Supplementary Methods 1). The chirality (Fig. 1d) of the end points of multiple independent Monte Carlo trajectories was computed. The results (Fig. 1a, left) are quite striking: two- and three-residue loops almost always give rise to L-hairpins, whereas five-residue loops give rise primarily to R-hairpins. These results suggest that the chirality of β-hairpins is determined by the chirality (l-amino acids versus d-amino acids) and local structural preferences of the polypeptide chain; indeed, only a restricted set of loop types have been found to be compatible with ββ-junctions23. Analysis of ββ-units in known protein structures (Supplementary Methods 3) shows that the chirality of ββ-units in native structures is correlated with loop length in a manner very similar to the simulations (Fig. 1a, right). Consistent with the idea that torsional strain is responsible for the trends, the calculated torsion energies of loops in native structures for two- and three-residue loops are lower for L-hairpins, and those for five-residue loops are lower for R-hairpins (Supplementary Fig. 2). This rule allows control over the pleating of β-hairpins.
The preferred orientation of βα-units is P for two-residue loops and A for three-residue loops. Secondary-structure-constrained folding simulations similar to those described in the previous paragraph strongly show this trend, and it is also observed in native protein structures (Fig. 1b). The rule arises in part from the bendability of the protein backbone (Supplementary Fig. 3). This rule is very useful for both positive and negative design, as it allows control of the side of a β-sheet that a helix will pack onto.
The preferred orientation of αβ-units is P. In secondary-structure-constrained folding simulations, this trend is observed strongly for loops two residues in length and for longer lengths when the loop provides a hydrogen-bonded capping interaction to stabilize the helix and does not extend the strand (Fig. 1c, left, and Supplementary Fig. 4). A very similar trend is again observed in native protein structures (Fig. 1c, right).
It must be emphasized that the three rules are largely independent of the amino-acid sequence of the secondary structures or connecting loops. As such, they must arise from the intrinsic chirality and local structural preferences of the polypeptide chain rather than from sequence-specific contributions. Whereas local sequence–structure relationships have been extensively studied24–27, there has been much less work on sequence-independent properties (the cataloguing of the discrete sets of loops compatible with junctions between secondary structure elements is a notable exception23). These rules provide a powerful way to perform negative design at the backbone level.
The next level of complexity in αβ-proteins beyond two secondary structure elements is segments of three consecutive secondary structure elements. Secondary-structure-constrained Rosetta folding simulations revealed strong dependencies of the chirality (Supplementary Fig. 1d) of ββα- and αββ-units and the foldability of βαβ-units on the lengths of the connecting loops and the secondary structure elements. These dependencies are formulated in emergent rules (Supplementary Fig. 1 and Supplementary Discussion 1), which follow from the fundamental rules described in the previous section. The rules specify how to choose the lengths of secondary structure elements and the connecting loops to favour a desired conformation of a ββα-, αββ- or βαβ-unit.
The fundamental and emergent rules make possible the encoding of funnel-shaped energy landscapes. We can sculpt energy landscapes to be strongly funnelled by designing secondary structure patterns that favour the tertiary motifs present in the desired topology and disfavor non-native motifs. The desired structure is then further stabilized by using RosettaDesign8 to obtain sequences with favourable non-local interactions such as complementary hydrophobic core packing. The latter step involves purely positive design because the energy of the desired structure is optimized without regard to competing states, whereas the design of sequences that favour specific secondary structure patterns also has elements of negative design because non-native conformations are disfavoured by the local structural preferences of the protein backbone captured by the rules.
We tested this approach by attempting to design strongly funneled landscapes for five different folds (Fig. 2 and Supplementary Discussion 2). The first step is to choose secondary structure lengths that favour the desired fold and disfavour alternatives. We illustrate how to choose the secondary structure lengths that favour a desired topology with Fold-I, the classic ferredoxin-like fold (Fig. 2, leftmost fold). The secondary structure elements are, in order, β1α1β2β3α2β4. To assign the lengths of the loops and strands, we apply the emergent rules to the αββ- and ββα-triples and the βα- and αβ-rules to the two βαβ-units: (β1α1)A(α1β2)P(α1β2β3)L(β2β3α2)R(β3α2)A(α2β4)P. Reading directly from Fig. 1 and Supplementary Fig. 1, we find that for strand length 7 the ideal loop lengths between successive secondary structure elements are 3, 2, 2, 3 and 2 (from the amino to the carboxy terminus). To assign the lengths of the helices, we find from Supplementary Fig. 10 that for strand length 7 the optimal helix length is 18. We can apply the same procedure to each of the other folds to obtain the corresponding ideal secondary structure lengths (Fig. 2): for Folds-II, -IV and -V, we treat (αβα) as (αβ)P(βα)P/A and apply the corresponding two fundamental rules.
To build tertiary backbone structures from the two-dimensional representations of protein folds depicted in Fig. 2, we carry out multiple independent Rosetta folding simulations using the secondary structure strings obtained from the rules. For Folds-I, -III and -V, a significant fraction of trajectories produced the desired topology because the secondary structure lengths were chosen specifically to encode it. Folds-II and -IV are not distinguished by the rules, and to resolve this ambiguity we varied the secondary structure lengths and used folding simulations to select lengths strongly favouring one or the other fold. For larger proteins, such degeneracies are likely to increase and additional rules may need to be identified to resolve them. Within the population of structures with the desired topology, there is still considerable variety in the distances and angles between the secondary structure elements, the loop conformations and the twist of the β-sheet. This variation is important because it provides a range of starting points for designing sequence-structure pairs with very low energy as described in the next paragraph.
Up to this point, specific sequence information has not been introduced; the representations are of the protein backbone alone. For each backbone in the ensemble, we then use Monte Carlo simulated annealing to identify amino acids and side-chain conformations that give rise to very low-energy structures. This is carried out using fixed-backbone RosettaDesign8 calculations followed by relaxation of the structure of the backbone and the side chains in the Rosetta all-atom energy function28. These sequence design and structure refinement calculations are then iterated8 to generate a tightly packed hydrophobic core with a packing density approaching that of close-packed crystals. Larger hydrophobic amino acids (Ile, Leu and Phe) are favoured in the core to create a strong driving force for folding29. Negative design is applied to the edge β-strands and the protein surface to destabilize non-native conformations and disfavour oligomerization: inward-pointing polar residues are introduced in the strands and hydrophobic patches are removed from the surface. The designed structures are then filtered according to energy, packing (as assessed by RosettaHoles30) and the local sequence–structure compatibility (Methods) to disfavour other structures (this last criterion is effectively a negative design step). Finally, for each sequence passing these filters, 200,000–400,000 independent Rosetta ab initio structure prediction simulations starting from an extended chain22 are performed to map out the folding energy landscapes. Roughly 10% of the designs have funnel-shaped energy landscapes leading into the designed structures (Fig. 3a; compare with Supplementary Fig. 11) and these are selected for experimental characterization. Proteins designed with this protocol (summarized in Supplementary Fig. 12) by construction have consistent local and non-local interactions. Notably, the only globular protein designed de novo before this work, Top7 (ref. 8), also satisfies our rules and has consistent local and non-local interactions.
We obtained synthetic genes encoding 11 designs for Fold-I, 12 for Fold-II, 14 for Fold-III, 5 for Fold-IV and 12 for Fold-V (Supplementary Table 8). None of these proteins is homologous to any known protein (BLAST E-value <0.02 against the NCBI nr database of non-redundant protein sequences). The proteins were expressed, purified and characterized by circular dichroism spectroscopy, size-exclusion chromatography combined with multi-angle light scattering (SECMALS), and NMR spectroscopy. For all five folds, most of the designs are expressed and soluble and many are extremely stable (Table 1 and Supplementary Tables 1–5). Data for the most stable monomeric design for each fold that had a well-resolved NMR spectrum (Di-I_5, Di-II_10, Di-III_14, Di-IV_5 and Di-V_7; ‘Di’ indicates designed ideal protein, the Roman numeral is the fold type and the number is the identifier of the design) are shown in Fig. 3, Supplementary Fig. 13 and Supplementary Table 6. These five proteins are soluble at concentrations of 0.9–1.6 mM, have far-ultraviolet circular dichroism spectra characteristic of αβ-proteins and have cooperative unfolding transitions with a free energy of unfolding of >5 kcal mol−1 (Fig. 3b, c and Supplementary Table 6). The designed proteins were found to be monomeric by SEC-MALS (Supplementary Fig. 13). The two-dimensional 1H–15N heteronuclear single quantum coherence (HSQC) spectra show the expected number of well-dispersed sharp peaks (Fig. 3d), indicating that the designed proteins are well packed. The solution structures of all five designs were determined by solution-state NMR spectroscopy (Fig. 4). Extensive validation analyses, including excellent agreement between back-calculated and measured NMR data (Supplementary Table 7), suggest that the NMR structures are quite high quality. The structures are remarkably consistent with the computational design models for both the protein backbone and the core side chains (Fig. 4, Supplementary Fig. 14 and Supplementary Table 6).
We have demonstrated that strongly funnelled landscapes can be designed by encoding consistency between the local and non-local interactions using rules that relate secondary structure lengths to tertiary structure patterns. The rules, which arise from the chirality and local structural preferences of the polypeptide chain, make possible the simultaneous positive design of interactions favouring the desired structure and negative design against competing alternatives. It is plausible that the same principles shape the folding landscapes of naturally occurring proteins, which are more frustrated but still exhibit the remarkable property of having unique native states considerably lower in energy than the vast number of alternative topologies. This idea is supported by the fact that the relationships between secondary structure patterns and tertiary structure motifs we identified in simulations are also observed in native structures (Fig. 1 and Supplementary Figs 1, 5, 7, 9 and 10); as in our design strategy, the disfavouring of the myriad alternative states may be achieved by naturally occurring sequences through the stabilization of local structures that disfavour non-native topologies31,32.
The design principles and methodology we have described should allow the ready design of a wide range of robust and stable protein building blocks for the next generation of engineered functional proteins33–41. Almost all protein design and engineering efforts so far have repurposed naturally occurring proteins that evolved for some other, often unrelated, function35–41. It should now become possible to custom-design protein scaffolds ideal for the desired function, and to build larger assemblies42,43 and materials from robust ideal building blocks.
Rosetta folding simulations using a sequence-independent backbone model were carried out in the studies of the fundamental rules (Fig. 1), the emergent rules (Supplementary Fig. 1) and the building of tertiary backbone structures in the rule-based designs. These simulations are referred to as secondary-structure-constrained folding simulations in the main text because the phi and psi angles of each residue are limited to the region of the Ramachandran plot compatible with the assigned secondary structure. We first introduce the backbone model and then describe the fragment assembly method45 used for simulating the backbone model.
The backbone model consists of main-chain atoms (N, NH, Cα,C and CO) and Cβ atoms with a pseudo-atom representing a generic side chain (the centroid model of Rosetta22). The Rosetta potential function terms and weights are as follows: steric repulsion (vdw = 1.0), overall compaction (rg = 1.0), secondary structure pairings (ss_pair = 1.0, rsigma = 1.0 and hs_pair = 1.0) and hydrogen bonds (hbond_sr_bb = 1.0, hbond_lr_bb = 1.0). For the steric radius of the side-chain pseudo-atom, the radius of Val was used.
Fragment assembly45 was used for sampling conformations of the backbone model. Backbone fragment sets consisting of 1, 3 or 9 consecutive residue fragments were prepared in advance from a non-redundant set of X-ray structures46; the fragments have information only on the phi, psi and omega torsion angles. We performed Monte Carlo simulations in which in each attempted Monte Carlo trial, a new conformation is generated by replacing the torsion angles (phi, psi and omega) of a randomly selected frame consisting of 1, 3 or 9 consecutive residues with the torsion angles of a randomly selected fragment compatible with the assigned secondary structure. Importantly, in the calculations for the fundamental rules, we used only one-residue fragments to avoid the possibility that the evolutional history of natural protein structures would bias the simulation results. Because we found that the fundamental rules are observed both in the simulations and in the natural proteins (Fig. 1), we used all fragment lengths in the simulations relating to the emergent rules and the rule-based designs. In the calculations for the fundamental rules, the total number of Monte Carlo steps in one trajectory was 500×(length of a simulated chain), and the temperature was 1.0. In the emergent rules and rule-based designs, the total number of Monte Carlo steps in one trajectory was 300×(length of a simulated chain), and the temperature was 1.5.
Sequence design was performed using the RosettaDesign approach8 with several extensions.
After designing sequences, we relaxed the backbone and side chains of the designed structures28. These sequence design and structure refinement calculations were iterated. The designed structures were then filtered on the basis of their Rosetta all-atom energy22, packing as assessed by RosettaHoles30, and the local sequence and structure compatibility. Finally, we visually inspected the designed structures and mutated buried polar and exposed hydrophobic residues using Foldit47.
Rosetta command lines are provided in Supplementary Data to perform the design protocol.
To evaluate the compatibility between the local sequence and the local structure, we collected 200 fragments for each nine-residue frame in the designed sequence from a non-redundant set of X-ray structures based on the sequence similarity and secondary structure prediction22 (the standard Rosetta fragment generation protocol for ab initio structure prediction). Then, for each frame, we calculated the root mean squared deviation between the designed local structure and each of the 200 fragments. Designs were ranked on the basis of the total number of fragments for which the root mean squared deviation was less than 1.0Å, and those with high values were selected.
For all designed sequences, a Gly–Ser was added at the C terminus to give a spacer between the designed region and the C-terminal 6×His tag. The genes encoding the designed sequences, which were cloned into plasmid pET29b, were obtained from GenScript. The designed proteins were expressed in Escherichia coli BL21 Star (DE3) cells as non-labelled proteins for all designs for Folds-I and -II, and as uniformly U-15N-labelled proteins for all designs for Folds-III to -V. The non-labelled proteins were expressed using auto-induction media48, and the U-15N-labelled proteins were expressed using MJ9 minimal media49, which contain 15Nammonium sulphate as the sole nitrogen source and 12C glucose as the sole carbon source. The expressed proteins with a 6×His tag at the C terminus were purified through a nickel affinity column. The purified proteins were then dialysed against typical PBS buffer, 137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4 and 1.8 mM KH2PO4, at pH 7.4; this buffer was used for all the experiments except NMR structure determination. The expression, solubility and purity of the designed proteins were assessed by SDS–polyacrylamide gel electrophoresis and mass spectrometry (TSQ LC/MS, Thermo Scientific).
All circular dichroism data were collected on an Aviv 62A DS spectrometer. Far-ultraviolet circular dichroism spectra of designed proteins were measured from 260 to 200nm for 14–25 µM protein samples in PBS buffer (pH 7.4) at various temperatures of 25, 50, 75 and 95 °C in a 1-mm-path-length cuvette. The protein concentrations were determined from the absorbance at 280nm (ref. 50) using an ultraviolet spectrophotometer (NanoDrop, Thermo Scientific). Tm is the melting temperature where the number of folded proteins is equal to the number of unfolded proteins during temperature denaturation. Chemical denaturations with GuHCl were monitored at 220nm for 3–4 µM protein samples in PBS buffer (pH 7.4) at 25 °C in a 1-cm-path-length cuvette. The GuHCl concentration was automatically controlled by a Microlab titrator (Hamilton). The chemical denaturation curves were fitted by nonlinear least-squares analysis using a two-state unfolding and linear extrapolation model51. The free-energy change for the unfolding transition, ΔG, and the value representing its dependency on the denaturant, the m-value (of which a higher value indicates higher cooperativity), were obtained from the fitting.
SEC-MALS experiments were performed using a miniDAWN TREOS static light-scattering detector (Wyatt Technology) combined with a HPLC system (LC 1200 Series, Agilent Technologies). One hundred microlitres of 400–700 µM protein samples in PBS buffer (pH 7.4) was injected into a Superdex 75 10/300 GL column (GE Healthcare) equilibrated with PBS buffer at a flow rate of 0.5 ml min−1. The protein concentrations were calculated from the absorbance at 280 nm detected by the HPLC system. Static light-scattering data were collected at three different angles, 41.4°, 90.0° and 138.6°, at 658 nm. These data were analysed using ASTRA software (version 5.3.4, Wyatt Technology) with a change in the refractive index with concentration (dn/dc value) of 0.185 ml g−1.
To assess the core packing of designed proteins, one-dimensional 1H NMR spectra were measured for the designs for Folds-I and -II, and two-dimensional 1H-15N HSQC spectra were measured for the designs for Folds-III to -V. The spectra were collected for 0.5–1.5 mM protein samples in 90% 1H2O/10% 2H2O PBS buffer (pH 7.4) at 25 °C on a Varian INOVA 600 MHz spectrometer. The most stable monomeric design with a well-resolved NMR spectrum for each fold (Di-I_5, Di-II_10, Di-III_14, Di-IV_5 and Di-V_7) was selected for NMR structure determination.
The five designs were expressed and purified following the standard, largely automated NESG protocol52. The designs were expressed in E. coli BL21 (DE3) pMGK cells as U-15N,5%13C-enriched proteins, and U-15N,U-13C-enriched proteins using MJ9 minimal media49. The U-15N,5%13C-labelled proteins were generated for stereospecific assignments of methyl groups of Val and Leu53 and for measurements of residual dipolar couplings54. The expressed proteins were purified using an ÄKTAxpress (GE Healthcare) two-step protocol consisting of IMAC (HisTrap HP column, GE Healthcare) and gel filtration chromatography (HiLoad 26/60 Superdex 75 column, GE Healthcare). The purified proteins were dissolved in 90% 1H2O/10% 2H2O buffer containing 20 mM MES, 200 mM NaCl, 10 mM DTT, 5 mM CaCl2 and 0.02% NaN3 at pH 6.5 for Di-I_5 and Di-II_10; 100 mM NaCl, 5.6 mM Na2HPO4, 1.1 mM KH2PO4 and 3 mM DTT at pH 7.5 for Di-III_14; and 100 mM NaCl, 5 mM DTT, 0.02% NaN3, 10 mM Tris-HCl at pH 7.5 for Di-IV_5 and Di-V_7. The expression, solubility and purity of the five proteins were assessed by SDS–polyacrylamide gel electrophoresis and matrix-assisted laser desorption/ionization–time of flight mass spectrometry.
Experimental NMR structure determination was carried out without any knowledge of the design model. For NMR structure determination, all NMR spectra were recorded at 25 °C using cryogenic NMR probes. Triple-resonance NMR data were collected on Varian INOVA 600 MHz or Bruker AVANCE 800 MHz spectrometers, and simultaneous three-dimensional 15N/13Caliphatic/13Caromatic-edited nuclear Overhauser enhancement spectroscopy (NOESY55; mixing time, 100 ms) and three-dimensional 13C-edited aromatic NOESY (mixing time, 100 ms) spectra were acquired on the Bruker AVANCE 800 MHz spectrometer. Two-dimensional constant-time 1H-13C HSQC spectra, with 28-ms and 42-ms constant-time delays, were recorded for the U-15N,5%13C-enriched samples on the Varian INOVA 600 MHz spectrometer to obtain stereospecific assignments of methyl groups of Val and Leu53. Backbone 15N-1H residual dipolar couplings in two alignment media, PEG and phage, were determined from J-modulated spectra54 for Di-II_10, Di-III_14 and Di-V_7. All NMR data were processed using the program NMRPIPE56 and analysed using the program XEASY57. Spectra were referenced to external DSS. Sequence-specific resonance assignments were determined as described previously58. Chemical shift data were deposited in the Biological Magnetic Resonance Data Bank with BMRB IDs 16387, 18558, 18145, 18561 and 18465 for Di-I_5, Di-II_10, Di-III_14, Di-IV_5 and Di-V_7, respectively. Initial NOESY peak lists containing expected intraresidue, sequential and α-helical medium-range NOE peaks were generated from the obtained assignments and then manually edited by visual inspection of the NOESY spectra. Subsequent manual peak picking was then used to identify remaining, primarily long-range NOEs58. Backbone dihedral angle constraints were derived from the chemical shifts using the program TALOS+59 for residues located in well-defined secondary structure elements, and were used for structure determination. Residual dipolar couplings were used as orientational constraints for well-defined residues during structure determination for Di-II_10, Di-III_14 and Di-V_7. The program CYANA60,61 was used to assign NOEs automatically and to calculate the structure. The 20 conformers with the lowest target function values were refined in explicit water solvent62 using the program CNS63. RPF analysis of AUTOSTRUCTURE64,65 was used in parallel to guide the iterative cycles of noise/artefact peak removal, peak picking and NOE assignments. The finally obtained structure coordinates were deposited in the Protein Data Bank. The structural statistics and global structure quality factors including VERIFY3D66, PROSAII67, PROCHECK68, and MOLPROBITY69 raw and statistical Z-scores were computed using PDBSTAT and PSVS 1.470. The global goodness-of-fit of the final structure ensembles with the NOESY peak list data was determined using the RPF analysis program71. The NMR data are available from http://psvs-1_4-dev.nesg.org/ideal_proteins/.
We thank N. Grishin for suggesting target folds for design, P. Rajagopal for one-dimensional NMR measurements of Folds-I and -II, and J. Siegel for measurements by mass spectrometer. We also thank P.-S. Huang and Y.-E. A. Ban for computational tools; J. L. Gallaher for experimental assistance; J. Castellanos for the help with designing Fold-IV; H.-W. Lee, K. Pederson and J. Prestegard for measurements of residual dipolar couplings; and S. Khare, F. DiMaio, I. Andre, S. Fleishman, J. Mills, S. Takada, S. Fuchigami and G. Chikenji for comments on the manuscript. This work was supported by HHMI, DOE, DARPA, DTRA and the National Institutes of General Medical Science Protein Structure Initiative (PSI:Biology) programme, grant U54 GM094597. N.K. was also supported by Japan Society for the Promotion of Science (JSPS) Postdoctoral Fellowships for Research Abroad.
Supplementary Information is available in the online version of the paper.
Author Contributions N.K., R.T.-K., G.L., G.T.M. and D.B. designed the research. N.K. performed folding simulations and analysed natural proteins. N.K. wrote program code. N.K. and R.T.-K. performed computational design work: Di-I_5 and Di-IV_5 were designed by N.K., and Di-II_10, Di-III_14 and Di-V_7 were designed by R.T.-K. R.T.-K. expressed, purified and characterized the designed proteins by biochemical assay. R.X. and T.B.A. prepared isotope-enriched protein samples for NMR structure determination. G.L. collected NMR data and determined the solution NMR structures. N.K., R.T.-K., G.L., G.T.M. and D.B. wrote the manuscript.
Author Information The NMR structures of the five designs have been deposited in the RCSB Protein Data Bank under the accession numbers 2KL8 (Di-I_5), 2LV8 (Di-II_10), 2LN3 (Di-III_14), 2LVB (Di-IV_5) and 2LTA (Di-V_7). NMR data have been deposited in the Biological Magnetic Resonance Data Bank under the accession numbers 16387 (Di-I_5), 18558 (Di-II_10), 18145 (Di-III_14), 18561 (Di-IV_5) and 18465 (Di-V_7).
The authors declare no competing financial interests. Readers are welcome to comment on the online version of the paper.