PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of mcpMolecular & Cellular Proteomics : MCP
 
Mol Cell Proteomics. 2012 July; 11(7): M111.016808.
Published online 2012 March 13. doi:  10.1074/mcp.M111.016808
PMCID: PMC3394956

Retrieving Backbone String Neighbors Provides Insights Into Structural Modeling of Membrane Proteins*An external file that holds a picture, illustration, etc.
Object name is sbox.jpg

Abstract

Identification of protein structural neighbors to a query is fundamental in structure and function prediction. Here we present BS-align, a systematic method to retrieve backbone string neighbors from primary sequences as templates for protein modeling. The backbone conformation of a protein is represented by the backbone string, as defined in Ramachandran space. The backbone string of a query can be accurately predicted by two innovative technologies: a knowledge-driven sequence alignment and encoding of a backbone string element profile. Then, the predicted backbone string is employed to align against a backbone string database and retrieve a set of backbone string neighbors. The backbone string neighbors were shown to be close to native structures of query proteins. BS-align was successfully employed to predict models of 10 membrane proteins with lengths ranging between 229 and 595 residues, and whose high-resolution structural determinations were difficult to elucidate both by experiment and prediction. The obtained TM-scores and root mean square deviations of the models confirmed that the models based on the backbone string neighbors retrieved by the BS-align were very close to the native membrane structures although the query and the neighbor shared a very low sequence identity. The backbone string system represents a new road for the prediction of protein structure from sequence, and suggests that the similarity of the backbone string would be more informative than describing a protein as belonging to a fold.

Determining the structures of membrane proteins remains a relatively unexplored frontier in structural biology (1). Current computational methods include de novo protein modeling and comparative modeling. Compared with de novo modeling, the comparative modeling is more successful when sequence homologies are available. However, because relatively few membrane proteins have been identified through experimentation, building membrane protein conformations remain an extremely difficult and daunting undertaking.

In comparative modeling or other methods based on known protein structures, the identification of the best structure neighbor (template), if indeed any are available, is critical. The typical method of template identification relies on serial pair-wise sequence alignments aided by database search engines such as FASTA (2) and BLAST (3). More sensitive methods based on multiple sequence alignments, including. PSI-BLAST (4), CLUSTALW (5), and HMMER (6), are available. MSAs have been shown to produce a greater number of potential templates and to better identify templates for sequences that have homologue relationships to other solved structures. However, when there is no significant homology found, most of the target-template pairs are evolutionarily too distant to be detected with the current threading approaches (7). If no other information about the target is known, aside from the sequence, it becomes difficult to identify possible templates and the correct threading of the target onto a known structure (it is an NP-hard problem for some models of threading (8)). On the other hand, when the query structure is known, structural alignment tools such as Structal (9), CE (10), SSM (11), TM-Align (12), and FragBag (13) can usually retrieve structural neighbors quickly and accurately, including proteins that share a low sequence similarity. From a structural view, the current protein data bank (PDB)1 may be the best approach to solve the problem of protein structure prediction (14). Exploiting a novel strategy between sequence-based and structure-based methods for retrieving protein neighbors may greatly change the landscape of protein structure prediction.

Here, we introduce the BS-align algorithm, a backbone string-based pipeline identifying structural neighbors from primary sequence by representing a protein structure as backbone string, which is defined on detached regions in Ramachandran space (15). This predicted representation is then used to search a candidate set of protein structural neighbors by backbone string alignment, and to show that very close neighbors of membrane proteins can be identified from their sequences.

First, we developed an accurate predictor of backbone strings, based on two innovative technologies: a knowledge-driven sequence alignment and a backbone string element profile embodying structural evolution information. Then, we retrieved backbone string neighbors by aligning the predicted backbone string of a query sequence against a backbone string database composed of known proteins. The retrieved top n sequences constitute a candidate set of neighbors. By testing a benchmark set of protein sequences, our approach outperformed most threading methods. Finally, we demonstrated the abilities of BS-align on 10 membrane proteins, whose lengths ranged between 229 and 595 residues, and whose high-resolution structures were difficult to determine, both by experiment and prediction. Our results confirmed that the backbone string neighbors generated by the BS-align algorithm were very close to the native membrane structures. Moreover, these conformational constraints dictated by the restricted regions of dihedral angles can be employed to guide the structure construction procedures such as in I-TASSER (16), Swiss-model (17), and Foldit online game (18).

MATERIALS AND METHODS

Backbone String Prediction

The flowchart of the backbone string prediction is shown in supplemental Fig. S3. The PSI-BLAST algorithm was initially employed to match a query sequence against a protein database constructed by a nonredundant PDB chain set nr3PDB, resulting in two parts: matched fragments and unmatched fragments. Then, we utilized the hallmark patterns in the hallmark pattern library (see next) to hit the unmatched fragments and obtain the hit segments. These hit segments and their flaking amino acid residues (+n and -n, default is 5) were aligned together against nr3PDB using PHI-BLAST (19), which found more matched shorter sequences. The matched fragments obtained by the first alignment and the shorter sequences obtained by the subsequent alignments were encoded based on corresponding backbone string element profile (see supplemental Fig. S5). The backbone string element profile was composed of eight elements (S, R, U, V, K, A, T, and G), which were employed as features for predicting the backbone string of the query. Lastly, conditional random field (see next) was performed for modeling and prediction.

Hallmark Pattern Generated

One innovative character of our approach was a knowledge-driven sequence alignment guided by seeds in a constructed hallmark pattern library (HPL), which was instrumental in searching structural similarities among highly divergent proteins. A hallmark pattern is composed of short consecutive sequences that are conserved both in the sequences and backbone strings. The HPL was the kernel of our approach and believed to reflect remote homologue information in the “twilight zone.” There are three steps to construct a hallmark pattern library to infer remote structural similarity among proteins.

Initially, we began a traversal search for consecutive sequence patterns with sufficient frequency in a representative nonredundant PDB chain set (nr0PDB, NCBI MMDB 2009 Dec, 0-level nonredundancy, 7775 entries in total). In our previous study (20), we introduced an algorithm that could extract local combinational variables with fixed locations from equal-length sequences. Here, the algorithm was developed to extract candidate patterns from unequal-length sequences without sequence alignment (see supplemental Fig. S4). These short patterns were merged with every other single fragment that contained the same residue as the former fragment in order to form potentially longer sequences while maintaining the frequency criterion. We set the frequency criterion to 100 and a total of 5,667 consecutive sequence patterns were obtained. The entire pattern extraction process progressed as the fragment grew longer, a process known as the bottom-up method.

Second, hallmark patterns were defined as conservative both in sequence patterns and backbone string structures. For each position of a consecutive sequence pattern, the p value of the corresponding backbone string of residue at this position was calculated according to a binomially distributed model (see Eq. 1 below),

equation image

Where N denoted the occurrence number of the pattern in the nr0PDB, a representative nonredundant PDB chain set; mj denoted the count of maximum occurrence backbone string at position j; and qj denoted the corresponding backbone string background probability of residue at position j. If one of the p values was less than 10−6, the consecutive sequence pattern was identified as a significant hallmark pattern.

Third, based on the p values, we selected 2761 hallmark patterns that typically exhibited conserved structures to construct the library. The HPL represented remote homology in the sequences and backbone strings and was an indispensable procedure in our approach.

Backbone String Element Profile

Another innovative character of our approach was the concept of backbone string element profile that was used for encoding as features for prediction. The backbone string element profile of a query was generated as follows: In the first step, the query sequence was aligned against the nr3PDB resulting in the top n (default is 10) subjects. Then, the backbone strings of the n subjects were retrieved from BSD. Finally, the backbone string elements for each residue were counted and stored in eight boxes. These boxes constituted a vector that represents the backbone string element profile for each residue and was considered to include the structural evolutionary information. The backbone string element profile was utilized as features for modeling and prediction (more details about one-dimensional structural profiles can be found in our previous study (21)).

Backbone string alignment

We generated protein backbone string neighbors by aligning the predicted backbone string of a query sequence against the BSD using BLAST (3). The similarity between backbone strings of two sequences was measured using BLAST e-value and percentages of identity. The top n backbone string neighbors and the query were input together into the multiple backbone string alignment algorithm, CLUSTALW (5), to align the positions of the matched fragments. The obtained sequences constituted a set of neighbor candidates.

Conditional Random Fields

CRFs are frameworks for building probabilistic models to segment and label sequence data, based on the conditional approach over label sequences given a particular observation sequence (Xobs) and label a novel observation sequence (Xtest) by selecting the label sequence (Ytest) that maximizes the conditional probability p (Ytest Xtest). Here, the features were formed only by the backbone string element profile and the label referred to the backbone string for each residue.

Modeling

We used the “alignment mode” in the Swiss-Model (17) workspace to model the target protein. The aligned query and the template sequences were submitted to Swiss-Model. For 10 backbone string neighbors, the best z-score model was adapted as the final model of the query.

Software Availability

Software for protein backbone string prediction and protein backbone string alignment can be found at http://code.google.com/p/bs-align/.

RESULTS

In the present study, the backbone string representing a protein structure constitutes discrete regions in Ramachandran space. In the literature, this was also referred to as a shape string (15) or a one-dimensional string (22). The backbone string distribution and abundance is presented in supplemental Fig. S1 and supplemental Table S1. Fig. 1 illustrates how this backbone string is assigned to the backbone conformation of a protein and its neighbor.

Fig. 1.
Backbone string representation of protein structure. The corresponding backbone strings [S (red), R (green), U (blue), V (yellow), K (magenta), A (cyan), T (orange), G (wheat)] of amino acid residues are colored on both proteins. N denotes missing backbone ...

Performance of Backbone String Prediction

The training data set, containing 4234 chains, was derived from the PDB (24) released before 2010 and was determined by x-ray diffraction with a resolution of ≤2.0 Å, an R-factor of ≤0.25 and was cutoff at 25% sequence identity using PISCES (25).

Table I lists the performances of a five-fold cross-validation on 4234 nonredundant chain sets. The first row in Table I is the Segment Overlap (SOV) (26) measure, which is one of the prediction evaluation criteria for critical assessment of techniques for protein structure prediction. To calculate overall accuracy for three-state backbone stings (S3), we mapped eight-state backbone string (S8) to the three-state by S, R, U, V->S, A, K->H, and T, G->T. The proposed method achieved an overall per-residue accuracy for the three-state backbone stings and eight-state backbone strings of 88.7% and 80.9% respectively, and an SOV of 86.4%, which was very close to the theoretical upper limit of accuracy of the secondary structure prediction (27).

Table I
The performances of five-fold cross-validation

To assess our backbone string prediction (BSP) method and the effect of the hallmark patterns, we used the latest EVA set (28) as an independent test set, which contained 79 proteins (1 was abolished out of 80 entries in the EVA set). The detailed results are listed in supplemental Table S2. The prediction by our method produced superior SOV (82.0%) and S3 (83.6%) values, outperforming an existing state-of-the-art method, Frag1D (22), by at least 6.9% in S3 and 4.7% in SOV. The more difficult S8 measure showed a remarkable improvement in performance (S8 74.4%, outperforming Frag1D by 6.8%) as well. The same trend occurred when the hallmark patterns were employed (outperforming when the hallmark patterns are not used by 9.2% in S3 and 6.2% in S8).

To assess the BSP method on newly measured proteins, we constructed independent test data by retrieving protein data released in the year 2010 from PDB, which were determined by x-ray diffraction with a resolution of ≤ 2.0 Å, an R-factor of ≤ 0.25, culled at 25% sequence identity, and contained 916 chains. Our method achieved an S3 of 84.6% and an S8 of 75.3%. The accuracy of prediction on three-state backbone strings S, H and T were 82.0%, 88.4% and 69.5%, respectively (see supplemental Table S3). The performance on newly measured PDB data demonstrated that the proposed method can be used for accurate prediction of protein backbone strings.

Backbone String Database

We utilized the actual backbone string of all known structural proteins in the nr3PDB database (NCBI MMDB 2009 Dec, three-level nonredundancy, 40849 entries in total) and constructed the backbone string database (BSD), which served as a benchmark alignment database. When we reduced the redundancy of the BSD by CD-HIT (29), the number of entries decreased quickly (Fig. 2), which confirmed the fact that the backbone string was more conserved than the sequence. These observations indicated that the backbone string maintained strong structural integrity and could be considered as the bridge between sequence-based and structure-based methods. When the backbone string identity was reduced to 50%, the number of left entries was approximately equal to the number of the folds in SCOP (1193, V1.75, 2009) (30). This finding implied that the backbone string may be a good criterion of protein classification. Moreover, the similarity of the backbone strings was the foundation of BS-align and was especially useful when sequences alignment was unfeasible.

Fig. 2.
Backbone string redundancy of BSD.

Performance of BS-align

A nonredundant set of 620 sequences (31) was employed to test the performances of threading and modeling approaches. Fig. 3 presents the results of different methods on this benchmark set, where all homologous templates with sequence identity to targets >30% have been removed from the template library (BSD). The average TM-score (12) (TM-scores on the order of 0.5 are indicative of highly significant structural similarity), average root mean square deviation (RMSD), and average alignment coverage of all 620 queries were used to evaluate the performance.

Fig. 3.
The results on the benchmark set for 11 state-of-the-art threading methods and BS-align. Computations for FUGUE (32), PROSPECT2 (33), HHSEARCH (34), SPARKS2 (35), SP3 (36), SAM-T02 (37), PCONS5 (38), PAINT, PPA-I, PPA-II, and LOMETS (31) were previously ...

The first model of BS-align achieved an average TM-score of 0.5671 and RMSD of 2.93 Å. Both were greatly improved in comparison to the best state-of-the-art method (TM-score = 0.4287 and RMSD = 6.92 Å). The average coverage achieved by the proposed method was 0.841.

Retrieving Membrane Protein Neighbors by BS-align

To demonstrate the ability of BS-align to retrieve candidates of backbone string neighbors from the sequences and use these candidates to model near-native structures of membrane proteins, we explored 10 large (>200 residues) and topologically complex membrane proteins from newly published references as a blind test set. Table II lists the results of the predicted neighbors of the best z-score (39) model for each membrane protein. The performance of the models was measured by RMSD of Cα between the model and the experimentally determined structure and the TM-score. In all cases, the models had RMSDs lower than 5.1 Å and TM-scores greater than 0.68. The coverage was more than 81.8% and the sequence identity was less than 40%.

Table II
Blind data test. TM-Score, template modeling score; RMSD, denoted for aligned residues; COV, coverage of aligned regions over the target (native) sequence; BSN, backbone string neighbor; Identity, between target sequence and BSN sequence

CBB3 Cytochrome Oxidase

The structure of a C-family heme-copper oxidase (HCO) (PDB ID 3MK7) was determined by applying favorable anomalous properties of iron for phase determination at a resolution of 3.2 Å (40). The prediction was extremely accurate with an RMSD of 3.38 Å and a TM-score of 0.80 (Fig. 4A). The length of 3MK7 was 474 residues and shared a 16.0% pairwise sequence identity with the neighbor protein recombinant cytochrome ba3 Oxidase (PDB ID 1XME). Whereas highly divergent in the amino acid sequence, different types of HCOs shared a similar structure, suggesting a similar mechanism of action (40, 41).

Fig. 4.
Predicted membrane protein structures. Superposition between the native protein (red) and the generated model (green): A, (3MK7A), B, (2WSWA), C, (2YL4A), and D, (3A2RX). All images were produced using PyMol software (23).

PorB

Neisseria meningitidis PorB (PDB ID 3A2R) was determined at 2.3 Å by using isomorphic and anomalous contributions for resolution of three heavy atom derivatives (42). The prediction was relatively good with an RMSD of 3.78 Å (Fig. 4D) and a TM-score of 0.68, which only shared a 16.6% pairwise sequence identity with the backbone string neighbor, protein Escherichia coli K-12 (PDB ID 3HWB). PorB was previously predicted to share a stable, 16-stranded β-barrel scaffold with other porins; however, even superimposition of PorB with known structures resulted in a low RMSD value, only 20 absolutely conserved residues were identified (42), which made primary sequence-based structure neighbor identification extremely difficult.

Carnitine transporter protein (CaiT)

The structures of the sodium independent carnitine/butyrobetaine antiporter CaiT from Proteus mirabilis (PmCaiT) (PDB ID 2WSW) was previously solved at 2.3 Å resolution by molecular replacement with a poly-alanine model of BetP (PDB ID 2WIT) (43). Here, we showed it was possible to find the more native neighbor and predicted PmCaiT at a high resolution with RMSD at 2.08 Å (Fig. 4B).

ATP-binding cassett transporter

Prediction of the first structurally-determined human ATP-binding cassette transporter (PDB ID 2YL4) was relatively less well-defined with an RMSD of 5.03 Å and a TM score of 0.73 (Fig. 4C). This result may have been due to the large size of the protein (595 residues) and a high level of topological complexity, which consisted of six transmembrane domains and two cytoplasmic nucleotide-binding domains. The other six targets are illustrated in supplemental Fig. S2.

DISCUSSION

Hallmark Pattern

Despite the developments in discovering motifs, little research has focused on the relationship between sequence patterns and their corresponding structures. The hallmark pattern that we propose in this study is a union of sequences and backbone strings. We used the short, conformational restricted consecutive sequences as a seed to guide sequence alignment in unmatched sequences, which were typically considered as remote homolog regions. It was our insight that when the sequence similarity was low, the knowledge-driven method produced better sequence alignments than using sequence similarities alone. The HPL (see Methods) is a kernel library of the BS-align algorithm and we believe that the HPL will be beneficial in finding remote homology in the “twilight zone” (≤25% similarity).

The Advantages of the Backbone String

There are three advantages of the backbone string that was introduced into the field of machine learning. One of the most prominent advantages of backbone string was its ability to describe the detailed protein backbone structure. Many studies have taken advantage of backbone conformational information (4446). Our previous work (47, 48) demonstrated that backbone string was important for turn identification as well. The second advantage was that the backbone string was more conservative than the sequence. For BSD, when the backbone string identity was reduced to 40% (Fig. 2), only 74 entries (72 proteins) remained, which indicated that cross fold similarities were abundant in geometrically similar proteins. Based on the SCOP classification system, there were 24 all-beta proteins, 14 alpha and beta proteins (a+b), 14 small proteins, 12 alpha and beta proteins (a/b), 10 all-alpha proteins, four multidomain proteins, three membrane and cell surface proteins, four peptides and one coiled coil protein found in these entries with lengths varying between 54 and 2512 residues. This phenomenon implied that the protein structures were fairly conserved and suggested that the backbone string may be a suitable criterion of taxonomy and a backbone string-based library may be more reasonable and compact than existing libraries. The third advantage was the alignment manipulation of the backbone string, which was relatively simple to complete, due to the backbone string being composed of eight elements rather than accurate backbone torsion angle values. The alignment accuracy between a query protein sequence and a known template structure was the key in determining the accuracy of the final three-dimensional model and was a bothersome procedure. The sequence alignment may not be of sufficient structural quality, whereas the secondary structure element alignment (SSEA) was inappropriate for accurate structural alignment. The backbone string alignment, described herein, demonstrated its ability to align against a database using sophisticated algorithms, such as BLAST, to produce accurate results.

Backbone String Neighbors

The backbone string neighbor is a structural neighbor based on sequence, and represents a new bridge joining sequence to structure. Despite the crucial functions performed by membrane proteins in living cells, it remains frustratingly difficult to obtain high-resolution 3D structures of membrane proteins (49). We identified the backbone string neighbors of membrane proteins and showed that our algorithm could build models of membrane structures from sequences even if they have large sizes and complex topologies. Because structural and functional characterization of membrane proteins relied on the isolation and purification of large amounts of sequences, we believe that BS-align will play an important role in computational biology both in prediction approaches and combining experimental approaches.

Footnotes

* This work was supported by the National Natural Science Foundation of China grants (20675057, 20705024).

An external file that holds a picture, illustration, etc.
Object name is sbox.jpg This article contains supplemental Figs. S1 to S5 and Tables S1 to S3.

1 The abbreviations used are:

BLAST
the Basic Local Alignment Search Tool
BS
backbone string
BS-align
backbong string alignment
BSD
backbone string database
BSN
backbone string neighbor
BSP
backbone string prediction
CRF
conditional random field
HPL
hallmark pattern library
MSA
multiple sequence alignment
nr0PDB
a representative non-redundant PDB chain set (NCBI MMDB 2009 Dec, 0-level non-redundancy)
nr3PDB
a nonredundant PDB chain set (NCBI MMDB 2009 Dec, 3-level non-redundancy)
PDB
protein data bank
Pred_BS
predicted backbone string
RMSD
root mean square deviation
S3
three-state backbone string
S8
eight-state backbone string
Seq
sequence
SOV
segment overlap measure
SSEA
secondary structure element alignment
Three-dimensional
3D
TM-score
template modeling score.

REFERENCES

1. Arora A., Tamm L. K. (2001) Biophysical approaches to membrane protein structure determination. Curr. Opin. Struct. Biol. 11, 540–547 [PubMed]
2. Lipman D. J., Pearson W. R. (1985) Rapid and sensitive protein similarity searches. Science 227, 1435–1441 [PubMed]
3. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410 [PubMed]
4. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 [PMC free article] [PubMed]
5. Larkin M. A., Blackshields G., Brown N. P., Chenna R., McGettigan P. A., McWilliam H., Valentin F., Wallace I. M., Wilm A., Lopez R., Thompson J. D., Gibson T. J., Higgins D. G. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948 [PubMed]
6. Krogh A., Brown M., Mian I. S., Sjölander K., Haussler D. (1994) Hidden markov models in computational biology: applications to protein modeling. J. Mol. Biol. 235, 1501–1531 [PubMed]
7. Yang Z. (2008) Progress and challenges in protein structure prediction. Curr. Opin. Struct. Biol. 18, 342–348 [PMC free article] [PubMed]
8. Lathrop R. H. (1994) The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Engineering 7, 1059–1068 [PubMed]
9. Subbiah S., Laurents D. V., Levitt M. (1993) Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core. Curr. Biol. 3, 141–148 [PubMed]
10. Shindyalov I. N., Bourne P. E. (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering 11, 739–747 [PubMed]
11. Krissinel E., Henrick K. (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. D Biol. Crystallogr. 60, 2256–2268 [PubMed]
12. Zhang Y., Skolnick J. (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 [PMC free article] [PubMed]
13. Budowski-Tal I., Nov Y., Kolodny R. (2010) FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately. Proc. Natl. Acad. Sci. U.S.A. 107, 3481–3486 [PubMed]
14. Zhang Y., Skolnick J. (2005) The protein structure prediction problem could be solved using the current PDB library. Proc. Natl. Acad. Sci. U.S.A. 102, 1029–1034 [PubMed]
15. Ison R. E., Hovmoller S., Kretsinger R. H. (2005) Proteins and their shape strings. An exemplary computer representation of protein structure. IEEE Eng. Med. Biol. Mag. 24, 41–49 [PubMed]
16. Roy A., Kucukural A., Zhang Y. (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protoc. 5, 725–738 [PMC free article] [PubMed]
17. Bordoli L., Kiefer F., Arnold K., Benkert P., Battey J., Schwede T. (2008) Protein structure homology modeling using SWISS-MODEL workspace. Nat. Protoc. 4, 1–13 [PubMed]
18. Cooper S., Khatib F., Treuille A., Barbero J., Lee J., Beenen M., Leaver-Fay A., Baker D., Popović Z., Players F. (2010) Predicting protein structures with a multiplayer online game. Nature 466, 756–760 [PMC free article] [PubMed]
19. Zhang Z., Schäffer A. A., Miller W., Madden T. L., Lipman D. J., Koonin E. V., Altschul S. F. (1998) Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res. 26, 3986–3990 [PMC free article] [PubMed]
20. Xiong W., Li T., Chen K., Tang K. (2009) Local combinational variables: an approach used in DNA-binding helix-turn-helix motif prediction with sequence information. Nucleic Acids Res. 37, 5632–5640 [PMC free article] [PubMed]
21. Li D., Li T., Cong P., Xiong W., Sun J. (2012) A novel structural position-specific scoring matrix for the prediction of protein secondary structures. Bioinformatics 28, 32–39 [PubMed]
22. Zhou T., Shu N., Hovmöller S. (2010) A Novel Method for Accurate One-dimensional Protein Structure Prediction Based on Fragment Matching. Bioinformatics 26, 470–477 [PubMed]
23. DeLano W. L. (2002) PyMOL 0.99. The PyMOL Molecular Graphics System (DeLano Scientific, Palo Alto CA)
24. Bernstein F. C., Koetzle T. F., Williams G. J. B., Meyer E. F., Jr., Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M. (1978) The protein data bank: A computer-based archival file for macromolecular structures. Arch. Biochem. Biophys. 185, 584–591 [PubMed]
25. Wang G., Dunbrack R. L., Jr. (2003) PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 [PubMed]
26. Rost B., Sander C., Schneider R. (1994) Redefining the goals of protein secondary structure prediction. J. Mol. Biol. 235, 13–26 [PubMed]
27. Rost B. (2003) Rising accuracy of protein secondary structure prediction. Protein structure determination, analysis, and modeling for drug discovery, pp. 207–249, Dekker Publishing, NY
28. Koh I. Y., Eyrich V. A., Marti-Renom M. A., Przybylski D., Madhusudhan M. S., Eswar N., Graña O., Pazos F., Valencia A., Sali A., Rost B. (2003) EVA: evaluation of protein structure prediction servers. Nucleic Acids Res. 31, 3311–3315 [PMC free article] [PubMed]
29. Li W., Godzik A. (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 [PubMed]
30. Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 [PubMed]
31. Wu S., Zhang Y. (2007) LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res. 35, 3375–3382 [PMC free article] [PubMed]
32. Shi J., Blundell T. L., Mizuguchi K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310, 243–257 [PubMed]
33. Xu Y., Xu D. (2000) Protein threading using PROSPECT: design and evaluation. Proteins: Structure, Function, Bioinformatics 40, 343–354 [PubMed]
34. Söding J. (2005) Protein homology detection by HMM–HMM comparison. Bioinformatics 21, 951–960 [PubMed]
35. Zhou H., Zhou Y. (2004) Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins: Structure, Function, Bioinformatics 55, 1005–1013 [PubMed]
36. Zhou H., Zhou Y. (2005) Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins: Structure, Function, Bioinformatics 58, 321–328 [PMC free article] [PubMed]
37. Karplus K., Karchin R., Draper J., Casper J., Mandel-Gutfreund Y., Diekhans M., Hughey R. (2003) Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins: Structure, Function, Bioinformatics 53, 491–496 [PubMed]
38. Wallner B., Elofsson A. (2005) Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics 21, 4248–4254 [PubMed]
39. Benkert P., Biasini M., Schwede T. (2011) Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 27, 343–350 [PMC free article] [PubMed]
40. Buschmann S., Warkentin E., Xie H., Langer J. D., Ermler U., Michel H. (2010) The Structure of cbb3 Cytochrome Oxidase Provides Insights into Proton Pumping. Science 329, 327–330 [PubMed]
41. Tiefenbrunn T., Liu W., Chen Y., Katritch V., Stout C. D., Fee J. A., Cherezov V. (2011) High Resolution Structure of the ba3 Cytochrome c Oxidase from Thermus thermophilus in a Lipidic Environment. PLoS ONE 6, e22348. [PMC free article] [PubMed]
42. Tanabe M., Nimigean C. M., Iverson T. M. (2010) Structural basis for solute transport, nucleotide regulation, and immunological recognition of Neisseria meningitidis PorB. Proc. Natl. Acad. Sci. U.S.A. 107, 6811–6816 [PubMed]
43. Schulze S., Köster S., Geldmacher U., Terwisscha Van Scheltinga A. C., Kühlbrandt W. (2010) Structural basis of Na+-independent and cooperative substrate/product antiport in CaiT. Nature 467, 233–236 [PubMed]
44. Gong H., Fleming P. J., Rose G. D. (2005) Building native protein conformation from highly approximate backbone torsion angles. Proc. Natl. Acad. Sci. U.S.A. 102, 16227–16232 [PubMed]
45. Porter L. L., Rose G. D. (2011) Redrawing the Ramachandran plot after inclusion of hydrogen-bonding constraints. Proc. Natl. Acad. Sci. U.S.A. 108, 109–113 [PubMed]
46. Ting D., Wang G., Shapovalov M., Mitra R., Jordan M. I., Dunbrack R. L., Jr. (2010) Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model. PLoS Comput. Biol. 6, e1000763. [PMC free article] [PubMed]
47. Tang Z., Li T., Liu R., Xiong W., Sun J., Zhu Y., Chen G. (2011) Improving the performance of β-turn prediction using predicted shape strings and a two-layer support vector machine model. BMC Bioinformatics 12, 283. [PMC free article] [PubMed]
48. Zhu Y., Li T., Li D., Zhang Y., Xiong W., Sun J., Tang Z., Chen G. (2011) Using predicted shape string to enhance the accuracy of γ-turn prediction. Amino Acids doi: 10.1007/s00726-011-0889-z [PubMed]
49. Elofsson A., von Heijne G. (2007) Membrane Protein Structure: prediction versus Reality. Annu. Rev. Biochem. 76, 125–140 [PubMed]

Articles from Molecular & Cellular Proteomics : MCP are provided here courtesy of American Society for Biochemistry and Molecular Biology