PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of prosciprotein sciencecshl presssubscriptionsetoc alertsthe protein societyjournal home
 
Protein Sci. 2018 January; 27(1): 286–292.
Published online 2017 November 6. doi:  10.1002/pro.3327
PMCID: PMC5734313

OPUS‐CSF: A C‐atom‐based scoring function for ranking protein structural models

Gang Xu, 1 , Tianqi Ma, 2 , 3 , Tianwu Zang, 2 , 3 Qinghua Wang, 4 and Jianpeng Macorresponding author 1 , 2 , 3 , 4

Abstract

We report a C‐atom‐based scoring function, named OPUS‐CSF, for ranking protein structural models. Rather than using traditional Boltzmann formula, we built a scoring function (CSF score) based on the native distributions (derived from the entire PDB) of coordinate components of mainchain C (carbonyl) atoms on selected residues of peptide segments of 5, 7, 9, and 11 residues in length. In testing OPUS‐CSF on decoy recognition, it maximally recognized 257 native structures out of 278 targets in 11 commonly used decoy sets, significantly outperforming other popular all‐atom empirical potentials. The average correlation coefficient with TM‐score was also comparable with those of other potentials. OPUS‐CSF is a highly coarse‐grained scoring function, which only requires input of partial mainchain information, and very fast. Thus, it is suitable for applications at early stage of structural building.

Keywords: protein structure modeling, protein folding, coarse‐graining, scoring function, decoy recognition

Introduction

A potential function plays a central role in predicting protein structures. Generally, there are two kinds of potential functions: physics‐based potentials and knowledge‐based potentials. Physics‐based potentials typically are the all‐atom molecular mechanics force‐fields,1, 2, 3, 4, 5 such as CHARMM1,2 and AMBER.4 They also include coarse‐grained potentials such as MARTINI,6 UNRES7, 8 and OPEP.9

The knowledge‐based potentials are derived from statistical analysis of known structures and are widely used in structural prediction.10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 They usually perform better than the physical potentials in structural prediction. In general, knowledge‐based potentials can be constructed either at coarse‐grained residue level17, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 or at atomic level.32, 33, 34, 35, 36, 37, 38, 39, 40, 41 Although coarse‐grained potentials may not be rigorous, it helps to focus on essential features and excludes less important details, thus reduces computational cost.42, 43 The performance of coarse‐grained potential is related to how one designs the coarse‐graining scheme. For example, OPUS‐Ca potential30 uses the positions of Cα atoms as input, calculates other atomic positions as pseudo‐positions and significantly reduces the computing cost. Other applications of coarse‐grained models using Cα positions are also reported in literature.44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55

In this work, unlike traditional empirical potential functions using Boltzmann formula, we built a scoring function based on the native distributions of coordinate components of mainchain C (carbonyl) atoms on a few selected residues of small peptide segments of 5, 7, 9, and 11 residues in length. A lookup table, termed as configurational native distribution (CND) lookup table, was first generated for native distributions of coordinate components by analyzing peptide segments in the entire Protein Data Bank (PDB). Then the scoring function, termed as CSF scoring function, was calculated for a particular test structure by comparing the information of its segments with the CND lookup table. The performance of OPUS‐CSF was tested on 11 commonly used decoy sets, the results indicated that OPUS‐CSF was able to identify significantly more native structures from their decoys than other empirical potentials. In terms of the correlation coefficients between CSF scores and TM‐scores, they were comparable to those of popular all‐atom empirical potentials. Most importantly, OPUS‐CSF achieved such performance despite its highly coarse‐grained nature. That indicates the advantages of OPUS‐CSF in terms of its speed and also for its applicability in the early stage of structural modeling. This is vitally important for applications such as building structural models from intermediate resolution data from experimental techniques like cryogenic electro‐microscopy (cryo‐EM).

Results and Discussion

We compared the performance of OPUS‐CSF on 11 commonly used decoy sets with that of popular all‐atom potential functions. In Table 1, we listed the results of 5‐residue segment case (OPUS‐CSF5) and all‐segment combined case (OPUS‐CSF). For the 5‐residue segment case, OPUS‐CSF5 successfully recognized 244 out of 278 native structures from their decoys and had the average Z‐score (–3.56) nearly identical to that of GOAP (–3.57). For combined segment case, OPUS‐CSF performs even better and successfully recognized 257 out of 278 native structures from their decoys and had an average Z‐score (–4.12) better than that of GOAP (–3.57). It is interesting that although OPUS‐CSF is a highly coarse‐grained scoring function, its performance is significantly better than other all‐atom potentials.

Table 1
The results of OPUS‐CSF5 (5‐residue segment) and OPUS‐CSF (combined segment length) on 11 decoys sets compared with different potentialsa

We also calculated the Pearson's correlation coefficients between CSF score and TM‐score56 in all decoy sets. The results are shown in Table 2. OPUS‐CSF has comparable average correlation coefficient with those of GOAP and OPUS‐PSP despite the fact that OPUS‐CSF is highly coarse‐grained and the other two are all‐atom potentials.

Table 2
Average Pearson correlation coefficients of CSF scores with TM‐scoresa

For further analysis of the method, we use 5‐residue segment case as an example, Figure Figure11 shows the histogram of standard deviations of the coordinate components of mainchain C (carbonyl) atoms of the 1st and 5th residues in the CND lookup table. It is clear that the distribution peaks at a very small value indicating that the coordinate components are clustered in a narrow distribution, that is, the configurational distributions of the 5‐residue peptide segments are narrow,57 which provides a foundation for the success of OPUS‐CSF. The narrow configurational distribution of small peptide fragments is also seen in other studies.58 In addition, the average value of the standard deviation is 1.20 Å.

Figure 1
The histogram of standard deviations of the coordinate components in the CND lookup table for 5‐residue segment case. The distribution peaks at a very small value of standard deviation indicating that the coordinate components of the 1st and 5th ...

It needs to be mentioned that, in the implementation of OPUS‐CSF, we assume that the smaller the CSF score, the more likely the structure to be native. This is an approximation because even a native structure may not usually have a zero CSF score. However, the narrow distributions of standard deviations of the coordinate components of mainchain C (carbonyl) atoms (Fig. (Fig.1)1) suggests small scores for the native structures. Figure Figure22 shows a population distribution of the CSF scores for 278 native structures in 11 decoy sets (per independent coordinate component). The average value of the native CSF scores is 0.84 and the standard deviation is 0.27. Thus, in native structures, the deviations of the coordinate components from their average values are less than one standard deviation of the coordinate component distribution in CND lookup table. The fluctuation of the native CSF scores is also very small.

Figure 2
The population distribution of CSF scores for 278 native structures in 11 decoy sets. The X‐axis is the CSF score (per independent coordinate component variable). The Y‐axis is the histogram of the population.

Figure Figure33 shows the frequencies of sequence repeating in the CND lookup table in 5‐residue case. In principle, the more times a sequence repeats in PDB, the better statistics one would have for that sequence in CND lookup table. In the 5‐residue case, half of the sequences repeat >26 times in the distribution. The largest value of X‐axis is 29,618 with one sequence. In constructing CND lookup table, there is always an issue between the sequence diversity and sequence repeating frequency in PDB.

Figure 3
The distribution of frequency of sequence repeating in the CND lookup table. The X‐axis is the repeating frequency, and the Y‐axis is the number of sequences with particular repeating frequency. Sequences that repeat less than five times ...

We examined OPUS‐CSF using different length of segments. As the length of segment increases, naturally the coverage decreases, and the ratio of the number of segments that appear more than five times to the total number of segments in PDB decreases (Table 3). On the other hand, if Coverage is defined as the ratio between the number of segments available in CND lookup table and the number of total segments of a test sequence, the average coverage of the 11 decoy sets (in total 278 targets) decreases as the length of segment increases. If a test sequence has <20% of its segments available in the CND lookup table, that is, its coverage is <20%, it is regarded as Unknown, then the number of unknowns increase as the lengths of segments increase. More details of OPUS‐CSF on different segment lengths can be found in Supplemental Information.

Table 3
The result of OPUS‐CSF built by different length of residue segmentsa

The 5‐residue case delivers the best performance in terms of decoy recognition (244 out 278 native recognition in Table 4). However, the Z‐scores are better for longer‐segment cases. This is probably because the longer segments preserve more sequence homology information.

Table 4
The performance of OPUS‐CSF based on different lengths of residue segments on the 11 decoys setsa

For the 5‐residue case, we also tested a scenario by constructing CND lookup table using four residues (1, 2, 4, and 5), instead of using two terminal residues (1, 5). The number of native recognition and Z‐score are 226 and −3.60, while, in the case of (1, 5), they are 244 and −3.56 (as indicated in Table 4). This is very interesting as it indicates that using two terminal residues (1, 5) captures a better coarse graining level than using more residues (1, 2, 4, and 5).

OPUS‐CSF has some obvious advantages. First, the CND lookup table is constructed directly from the entire PDB, and it contains the information of all allowed configurational information of the native segments (at least for the ones repeated more than five times in PDB). The results seem to indicate that it is better than Boltzmann formula based methods. Second, the speed of OPUS‐CSF is very fast, especially for longer polypeptide chains. This is because the entire chain is scanned once and linearly, it only requires partial mainchain atom coordinates to calculate the CSF score for a structure. Unlike other potentials such as GOAP40 and OPUS‐PSP,34 no inter‐atomic distances need to be calculated. We want to emphasize that, in modeling protein structures, an empirical potential function or a scoring function, should be fast and accurate. In early stage of modeling, it is advantageous that the scoring function requires minimal amount of structural information. In this regard, OPUS‐CSF seems to be a good choice.

Methods

Scanning through the polypeptide chain with a step size of one residue, we collected small peptide segments with sequence length of 5, 7, 9, and 11 residues and searched for their configurations in the entire PDB. Totally, we downloaded 130,054 PDB structures on June 7, 2017 via ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb. The sequences that appeared less than five times in PDB were discarded. The number five was chosen empirically. Peptide segments with poorly resolved structures such as broken bonds were not included.

Here we use 5‐residue segment case as an example to illustrate the details of the procedure. The ratio of segments that appear more than five times to all segments in PDB is 75.1%, which means we can utilize 75.1% of the information in the whole PDB using 5‐residue segments (also see Table 3 in Results and Discussion).

A local molecular coordinate system was defined for every segment using the positions of three main‐chain atoms in the middle residue. The origin was set at the Cα atom, the X‐axis was defined along the line connecting Cα and C (carbonyl) atoms, Y‐axis was in the Cα ‐C‐O plane, parallel to component of C‐O vector that was perpendicular to the X‐axis, and the Z‐axis was defined correspondingly (Fig. (Fig.44).

Figure 4
Local molecular coordinate system in OPUS‐CSF defined by the mainchain atoms of the 3rd residues. The origin is on Cα atom. The X‐axis is along the Cα–C line. Y‐axis is in the plan of Cα–C–O ...

For a 5‐residue segment with a specific sequence, we saved the mainchain C (carbonyl) coordinates of the 1st and 5th residue in the local coordinate system, denoted as (x1,y1,z1) and (x5,y5,z5). And under our assumption, we treated coordinate components x1,y1,z1,x5,y5,z5 as six independent variables. By scanning through the entire PDB, we generated six independent distributions of these variables, called configurational native distributions (CNDs) of 5‐residue segments. We then calculated the means and standard deviations of the distributions and they were kept as the CND lookup table.

For a test structure, we scanned through its sequence with 5‐residue‐segments. For each segment and its sequence, we looked for the Z‐scores of the six independent variables in the CND lookup table. At the end, we added up all the absolute values of Z‐scores of all variables for all segments, and it was called CSF score. We assume the structure with smallest CSF score has the largest likelihood to be the native structure.

The segments of varying lengths are denoted as 5(1, 3, 5), 7(2, 4, 6), 9(1, 3, 5, 7, 9) and 11(2, 4, 6, 8, 10). Here, in segments with the form of 5(1, 3, 5), for example, the first number 5 is the segment length, 1,5 in the parenthesis are the residues that we record C (carbonyl) atom positional distributions in local coordinate system, 3 is the residue on which the local coordinate system is defined. For 9(1, 3, 5, 7, 9) and 11(2, 4, 6, 8, 10), four atoms are used for recording mainchain C (carbonyl) positional distributions, thus totally 12 independent variables are used.

The CSF score can be calculated either based on one particular segment length or by combining all segment length together. In the case of combined segment length, final CSF score is a linear sum of all CSF scores of different segment length. No weighting function is introduced for the contribution of different segment lengths.

The 11 commonly used decoy sets we used to test OPUS‐CSF are the same as those used in GOAP,40 including decoy sets of 4state_reduced,59 fisa,58 fisa_casp3.58 hg_structal, ig_structal and ig_structal_hires (R. Samudrala, E. Huang, and M. Levitt, unpublished). I‐TASSER,39 lattice_ssfit,60, 61 lmds,62 MOULDER63 and ROSETTA.64

Accessibility of OPUS‐CSF

The scoring function is freely available to the academic community.

Supporting information

Supporting Information

Acknowledgments

The authors wish to thank Robert L. Jernigan for careful reading of the manuscript and numerous comments on how to improve it. J.M. thanks support from the National Institutes of Health (R01‐GM067801, R01‐GM116280), and the Welch Foundation (Q‐1512). Q.W. thanks support from the National Institutes of Health (R01‐AI067839, R01‐GM116280), the Gillson‐Longenbaugh Foundation, and The Welch Foundation (Q‐1826).

References

1. MacKerell AD, Bashford D, Bellott M, Dunbrack RL, Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph‐McCarthy D, Kuchnir L, Kuczera K, Lau FT, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher WE, Roux B, Schlenkrich M, Smith JC, Stote R, Straub J, Watanabe M, Wiórkiewicz‐Kuczera J, Yin D, Karplus M (1998) All‐atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem B 102:3586–3616. [PubMed]
2. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4:187–217.
3. Weiner SJ, Kollman PA, Nguyen DT, Case DA (1986) An all atom force field for simulations of proteins and nucleic acids. J Comput Chem 7:230–252. [PubMed]
4. Case DA, Cheatham TE, Darden T, Gohlke H, Luo R, Merz KM, Onufriev A, Simmerling C, Wang B, Woods RJ (2005) The Amber biomolecular simulation programs. J Comput Chem 26:1668–1688. [PubMed]
5. Arnautova YA, Jagielska A, Scheraga HA (2006) A new force field (ECEPP‐05) for peptides, proteins, and organic molecules. J Phys Chem B 110:5025–5044. [PubMed]
6. Marrink SJ, Risselada HJ, Yefimov S, Tieleman DP, De Vries AH (2007) The MARTINI force field: coarse grained model for biomolecular simulations. J Phys Chem B 111:7812–7824. [PubMed]
7. Liwo A, O?dziej S, Pincus MR, Wawak RJ, Rackovsky S, Scheraga HA (1997) A united‐residue force field for off‐lattice protein‐structure simulations. I. Functional forms and parameters of long‐range side‐chain interaction potentials from protein crystal data. J Comput Chem 18:849–873.
8. Liwo A, Pincus MR, Wawak RJ, Rackovsky S, O?dziej S, Scheraga HA (1997) A united‐residue force field for off‐lattice protein‐structure simulations. II. Parameterization of short‐range interactions and determination of weights of energy terms by Z‐score optimization. J Comput Chem 18:874–887.
9. Chebaro Y, Pasquali S, Derreumaux P (2012) The coarse‐grained OPEP force field for non‐amyloid and amyloid proteins. J Phys Chem B 116:8741–8752. [PubMed]
10. Skolnick J (2006) In quest of an empirical potential for protein structure prediction. Curr Opin Struct Biol 16:166–171. [PubMed]
11. Sippl MJ (1995) Knowledge‐based potentials for proteins. Curr Opin Struct Biol 5:229–235. [PubMed]
12. Jernigan RL, Bahar I (1996) Structure‐derived potentials and protein simulations. Curr Opin Struct Biol 6:195–209. [PubMed]
13. Moult J (1997) Comparison of database potentials and molecular mechanics force fields. Curr Opin Struct Biol 7:194–199. [PubMed]
14. Lazaridis T, Karplus M (2000) Effective energy functions for protein structure prediction. Curr Opin Struct Biol 10:139–145. [PubMed]
15. Gohlke H, Klebe G (2001) Statistical potentials and scoring functions applied to protein–ligand binding. Curr Opin Struct Biol 11:231–235. [PubMed]
16. Russ WP, Ranganathan R (2002) Knowledge‐based potential functions in protein design. Curr Opin Struct Biol 12:447–452. [PubMed]
17. Buchete N, Straub J, Thirumalai D (2004) Development of novel statistical potentials for protein fold recognition. Curr Opin Struct Biol 14:225–232. [PubMed]
18. Poole AM, Ranganathan R (2006) Knowledge‐based potentials in protein design. Curr Opin Struct Biol 16:508–513. [PubMed]
19. Zhou Y, Zhou H, Zhang C, Liu S (2006) What is a desirable statistical energy functions for proteins and how can it be obtained?. Cell Biochem Biophys 46:165–174. [PubMed]
20. Ma J (2009) Explicit orientation dependence in empirical potentials and its significance to side‐chain modeling. Acc Chem Res 42:1087–1096. [PubMed]
21. Gilis D, Biot C, Buisine E, Dehouck Y, Rooman M (2006) Development of novel statistical potentials describing cation‐π interactions in proteins and comparison with semiempirical and quantum chemistry approaches. J Chem Inform Model 46:884–893. [PubMed]
22. Hendlich M, Lackner P, Weitckus S, Floeckner H, Froschauer R, Gottsbacher K, Casari G, Sippl MJ (1990) Identification of native protein folds amongst a large number of incorrect models: the calculation of low energy conformations from potentials of mean force. J Mol Biol 216:167–180. [PubMed]
23. Hoppe C, Schomburg D (2005) Prediction of protein thermostability with a direction‐ and distance‐dependent knowledge‐based potential. Protein Sci 14:2682–2692. [PubMed]
24. Jones DT, Taylor WR, Thornton JM (1992) A new approach to protein fold recognition. Nature 358:86–89. [PubMed]
25. Koliński A, Bujnicki JM (2005) Generalized protein structure prediction based on combination of fold‐recognition with de novo folding and evaluation of models. Proteins 61:84–90. [PubMed]
26. Miyazawa S, Jernigan RL (1985) Estimation of effective interresidue contact energies from protein crystal‐structures: quasi‐chemical approximation. Macromolecules 18:534–552.
27. Sippl MJ (1990) Calculation of conformational ensembles from potentials of mena force: an approach to the knowledge‐based prediction of local structures in globular proteins. J Mol Biol 213:859–883. [PubMed]
28. Skolnick J, Kolinski A, Ortiz A (2000) Derivation of protein‐specific pair potentials based on weak sequence fragment similarity. Proteins 38:3–16. [PubMed]
29. Tobi D, Elber R (2000) Distance‐dependent, pair potential for protein folding: Results from linear optimization. Proteins 41:40–46. [PubMed]
30. Wu Y, Lu M, Chen M, Li J, Ma J (2007) OPUS‐Ca: a knowledge‐based potential function requiring only Cα positions. Protein Sci 16:1449–1463. [PubMed]
31. Zhang Y, Kolinski A, Skolnick J (2003) TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys J 85:1145–1164. [PubMed]
32. DeBolt SE, Skolnick J (1996) Evaluation of atomic level mean force potentials via inverse folding and inverse refinement of protein structures: atomic burial position and pairwise non‐bonded interactions. Protein Eng 9:637–655. [PubMed]
33. Lu H, Skolnick J (2001) A distance‐dependent atomic knowledge‐based potential for improved protein structure selection. Proteins 44:223–232. [PubMed]
34. Lu M, Dousis AD, Ma J (2008) OPUS‐PSP: an orientation‐dependent statistical all‐atom potential derived from side‐chain packing. J Mol Biol 376:288–301. [PubMed]
35. Samudrala R, Moult J (1998) An all‐atom distance‐dependent conditional probability discriminatory function for protein structure prediction. J Mol Biol 275:895–916. [PubMed]
36. Shen M, Sali A (2006) Statistical potential for assessment and prediction of protein structures. Protein Sci 15:2507–2524. [PubMed]
37. Yang Y, Zhou Y (2008) Specific interactions for ab initio folding of protein terminal regions with secondary structures. Proteins 72:793–803. [PubMed]
38. Zhang C, Vasmatzis G, Cornette JL, DeLisi C (1997) Determination of atomic desolvation energies from the structures of crystallized proteins. J Mol Biol 267:707–726. [PubMed]
39. Zhang J, Zhang Y (2010) A novel side‐chain orientation dependent potential derived from random‐walk reference state for protein fold selection and structure prediction. PLoS One 5:e15386. [PMC free article] [PubMed]
40. Zhou H, Skolnick J (2011) GOAP: a generalized orientation‐dependent, all‐atom statistical potential for protein structure prediction. Biophys J 101:2043–2052. [PubMed]
41. Zhou H, Zhou Y (2002) Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction. Protein Sci 11:2714–2726. [PubMed]
42. Noid W (2013) Perspective: Coarse‐grained models for biomolecular systems. J Chem Phys 139:090901. [PubMed]
43. Kmiecik S, Gront D, Kolinski M, Wieteska L, Dawid AE, Kolinski A (2016) Coarse‐grained protein models and their applications. Chem Rev 116:7898–7936. [PubMed]
44. Wu Y, Tian X, Lu M, Chen M, Wang Q, Ma J (2005) Folding of small helical proteins assisted by small‐angle X‐ray scattering profiles. Structure 13:1587–1597. [PubMed]
45. Wu Y, Chen M, Lu M, Wang Q, Ma J (2005) Determining protein topology from skeletons of secondary structures. J Mol Biol 350:571–586. [PubMed]
46. Maupetit J, Gautier R, Tufféry P (2006) SABBAC: online Structural Alphabet‐based protein BackBone reconstruction from Alpha‐Carbon trace. Nucleic Acids Res 34:W147–W151. [PubMed]
47. Kong Y, Ma J (2003) A structural‐informatics approach for mining β‐sheets: locating sheets in intermediate‐resolution density maps. J Mol Biol 332:399–413. [PubMed]
48. Kong Y, Zhang X, Baker TS, Ma J (2004) A structural‐informatics approach for tracing β‐sheets: Building pseudo‐Cα traces for β‐strands in intermediate‐resolution density maps. J Mol Biol 339:117–130. [PubMed]
49. Moore BL, Kelley LA, Barber J, Murray JW, MacDonald JT (2013) High–quality protein backbone reconstruction from alpha carbons using Gaussian mixture models. J Comput Chem 34:1881–1889. [PubMed]
50. Reid LS, Thornton JM (1989) Rebuilding flavodoxin from Cα coordinates: a test study. Proteins 5:170–182. [PubMed]
51. Rey A, Skolnick J (1992) Efficient algorithm for the reconstruction of a protein backbone from the α‐carbon coordinates. J Comput Chem 13:443–456.
52. Liwo A, Wawak R, Scheraga H, Pincus M, Rackovsky S (1993) Calculation of protein backbone geometry from α‐carbon coordinates based on peptide‐group dipole alignment. Protein Sci 2:1697–1714. [PubMed]
53. Iwata Y, Kasuya A, Miyamoto S (2002) An efficient method for reconstructing protein backbones from α‐carbon coordinates. J Mol Graph Model 21:119–128. [PubMed]
54. Correa PE (1990) The building of protein structures form α‐carbon coordinates. Proteins 7:366–377. [PubMed]
55. Payne PW (2008) Reconstruction of protein conformations from estimated positions of the Cα coordinates. Protein Sci 2:315–324. [PubMed]
56. Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins 57:702–710. [PubMed]
57. Tang H‐Y, Zhang Z‐G (2007) Using C′ deviation to study structures of central amino acids in peptide fragments. Amino Acids 33:689–693. [PubMed]
58. Simons KT, Kooperberg C, Huang E, Baker D (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 268:209–225. [PubMed]
59. Park B, Levitt M (1996) Energy functions that discriminate X‐ray and near‐native folds from well‐constructed decoys. J Mol Biol 258:367–392. [PubMed]
60. Samudrala R, Xia Y, Levitt M, Huang E (1999) A combined approach for ab initio construction of low resolution protein tertiary structures from sequence. Pac Symp Biocomput 1999:505–516. [PubMed]
61. Xia Y, Huang ES, Levitt M, Samudrala R (2000) Ab initio construction of protein tertiary structures using a hierarchical approach. J Mol Biol 300:171–185. [PubMed]
62. Keasar C, Levitt M (2003) A novel approach to decoy set generation: designing a physical energy function having local minima with native structure characteristics. J Mol Biol 329:159–174. [PubMed]
63. John B, Sali A (2003) Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res 31:3982–3992. [PubMed]
64. Tsai J, Bonneau R, Morozov AV, Kuhlman B, Rohl CA, Baker D (2003) An improved protein decoy set for testing energy functions for protein structure prediction. Proteins 53:76–87. [PubMed]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society