|Home | About | Journals | Submit | Contact Us | Français|
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.
The value of a protein's three-dimensional (3D) structure in connection with its molecular function is enormous because it provides a solid framework for planning experiments and for the interpretation of their results. Since experimental structure determination is very expensive and is not always successful, theoretical structure prediction became an important area of modern biology. There are several initiatives undertaken by the protein structure prediction community to provide an assessment of the capabilities and limitations of current methods for protein structure predictions: CASP (1), CAFASP (2), Livebench (3) and EVA (4). A major finding from the latest assessments is that better structure predictions can be obtained by combining the results produced using several different methods because they have different strengths and weaknesses. The CASP4 experiment showed that the group named CAFASP-consensus, which filed predictions extracted from a number of automated servers, performed considerably better than any individual server and better than all but six human predictors (5). In the last CASP5 experiment, the success of various ‘meta-servers’ and groups that used them to judiciously combine results obtained by several different methods was evident in all 3D structure modeling categories—from Comparative Modeling (CM), to Fold Recognition (FR), to Novel Fold (NF) prediction, as well as in the secondary structure (SS) prediction category (http://predictioncenter.llnl.gov/casp5/).
As reported by others (6,7) and in our hands as well, the use of manually refined multiple sequence alignments (MSA) as structure prediction queries gives significant improvement in the model quality (agreement with the real structure) over predictions based on single sequences. Most of the individual structure prediction methods (SS prediction as well as FR) allow the user to submit his/her own alignment or provide a BLAST or PSI-BLAST (8) utility to automatically build a MSA. The quality of MSA obtained by automatic methods is usually acceptable but user-defined alignments become clearly superior if the query protein has little or no close homologs in the sequence database used by default by the FR server. Moreover, the divergence of the query sequence (for instance the presence of very long loops) often leads to significant errors in the automatically-generated alignments. During the recent CASP4 and CASP5 experiments, in many cases we were able to obtain confident predictions of the correct fold only when we submitted a refined MSA, which repeatedly included additional sequences obtained from unfinished genomes or the EST databases. Such sequences are not available in the default databases and sometimes allow to increase the size of MSA more than 5-fold—this is critical when one compares the evolutionary information contained in the automatic alignment of 2–5 sequences and in the user-defined MSA of 10–25 sequences. Accordingly, when we submitted single sequences for automatic MSA building for such targets, prediction results often became ambiguous (data not shown). Submission of manually refined MSAs to FR servers allowed us to achieve high rankings—consistently within the best groups in both CM and FR categories [i.e. CASP5 (9) and the assessment summary for BioInfo.PL in CASP4 (5,10)].
The existing meta-servers (for instance those available at http://bioinfo.pl or http://bioserv.infobiosud.univ-montp1.fr) provide a convenient interface for submission of prediction queries to multiple methods, unified presentation of their results and inference of a rational consensus. However, they do not allow the user to submit a MSA to the FR servers, which in our opinion is a critical issue in 3D structure prediction. Besides, the issue of confidentiality of results is not always addressed—for instance the BioInfo metaserver (11) makes a list of all prediction queries and the addresses of the computers from which the queries were submitted and also makes the prediction results freely available to everybody. This may be strongly discouraging for those users who don't want the prediction query (which may be a novel protein) or the results of the analysis to be revealed to any third party.
Our aim was to create a convenient, secure and simple on-line structure prediction service for users who prefer not to sacrifice the quality for speed by unreservedly relying on automatic database searches, but choose to submit manually refined sequence alignments in order to obtain potentially more accurate predictions. Hence, we developed a novel WWW ‘meta-server’ as a gateway to several protein structure prediction methods, which addresses the two aforementioned key issues (i.e. MSA submission and data confidentiality). The GeneSilico server facilitates the access to several structure prediction methods through a single, secure and user-friendly WWW interface. Its architecture allows easy web scripting, which greatly facilitates automated submission and retrieval of data by clients (user-agents) based on the xml-rpc serialization. Conforming to the object-oriented programming standards, each page is actually an object that can be serialized and/or treated as a method. Our server is freely available at the URL http://genesilico.pl/meta for academic users who sign a license agreement. Some of the components may be unavailable for commercial users who are nevertheless welcome to contact us in order to obtain a separate limited license.
The user has several options for submission of the prediction query. Our server accepts both single sequences and alignments. If a single sequence is submitted, each method generates its own MSA, as in the other meta-servers. If a MSA is submitted, the user can choose between submission of a full-length query or limiting the analysis to regions with less than 30% gaps in the alignment. The second option allows to remove highly divergent loops, which often cause problems when matching the core structures of the template and the target. The user-defined MSA is submitted to those FR servers which allow this format of submission (see the http://genesilico.pl/meta web site for details). For submission to those servers which accept only single sequences and always build their own MSA, the user-defined MSA is converted into a ‘consensus sequence’—again, the user has the freedom to choose between several alternative methods for consensus generation. We have tested the value of MSA submission during the recent CASP5 experiment—the quality of the target-template alignments we obtained from our meta-server was exceptional, which greatly helped us to ‘win’ the homology modeling contest (9).
The currently installed components of the GeneSilico meta-server include:
The GeneSilico meta-server is continuously upgraded and enhanced with new tools. We hope that it will be as useful for the wide community as was for us in the CASP5 experiment, as well as in our daily work on protein sequence analysis and structure prediction.
We are grateful to developers of the protein fold recognition servers who kindly agreed to have them included in our meta-server, in particular Drs Arne Elofsson, Daniel Fischer, Adam Godzik, David Jones, Kevin Karplus, Lawrence Kelley, Kenji Mizuguchi and Jinbo Xu. J.M.B. is a fellow of the Foundation for Polish Science. This work was supported by the EMBO and HHMI Young Investigator Programme award to J.M.B. and by KBN (grants 3P04A 011 24 to J.M.B. and 3P05A 020 24 to M.A.K.).