|Home | About | Journals | Submit | Contact Us | Français|
The Vienna RNA secondary structure server provides a web interface to the most frequently used functions of the Vienna RNA software package for the analysis of RNA secondary structures. It currently offers prediction of secondary structure from a single sequence, prediction of the consensus secondary structure for a set of aligned sequences and the design of sequences that will fold into a predefined structure. All three services can be accessed via the Vienna RNA web server at http://rna.tbi.univie.ac.at/.
Biomolecules exhibit a close interplay between structure and function. Therefore the growing number of RNA molecules with complex functions, beyond that of encoding proteins, has brought increased demand for RNA structure prediction methods. While prediction of tertiary structure is usually infeasible, the area of RNA secondary structures is an example where computational methods have been highly successful.
The first practical dynamic programming algorithms to predict the optimal secondary structure of an RNA sequence date back over 20 years (1). Since then they have been extended to allow prediction of suboptimal structures (2,3) and thermodynamic ensembles (4), which allow to assign a confidence level or ‘well definedness’ to the predictions (5).
Recently, several methods have addressed the problem of predicting a consensus structure for a group of related RNA sequences (6–11). Such conserved structures are of particular interest, since conservation of structure in spite of sequence variation implies that the structure must be functionally important. By enhancing energy rules with sequence covariation these methods also obtain much better prediction accuracies.
The Vienna RNA package (12) is a free software package that implements a variety of algorithms for the prediction and analysis of RNA secondary structures. The package is, however, strongly geared toward Unix command-line users and programmers. For the less computer savvy, or occasional user, it provides neither a point-and-click graphical user interface nor even pre-compiled binaries.
The Vienna RNA web site tries to address these shortcomings by offering access to the most popular features via an easy to use web interface. It consists of three CGI scripts equivalent to the RNAfold, RNAalifold and RNAinverse command line programs, respectively. While the servers have to limit request sizes for performance reasons, they return for each request an equivalent command line invocation. This makes it easier for users to make the transition to locally installed software, should their requirements exceed the limits of the web service.
Of the three services, the RNAfold server provides both the most basic and most widely used function. Input consists of a single sequence that has to be typed or pasted into a text field of the input form.
In the simplest case, the server predicts only the minimum free energy (mfe) structure of a single sequence using the classic algorithm of Zuker and Stiegler (1). In addition to mfe folding the server can calculate equilibrium base pairing probabilities via John McCaskill's partition function algorithm (4).
The fold server output consists of a static html page presenting the predicted mfe structure as a string in bracket notation and links to the plots generated for visualization. Three types of plots can be produced. Firstly, the predicted mfe structure is plotted as a conventional secondary structure graph using the naview layout method (15). The pair probabilities can be visualized in a so-called ‘dot plot’: on a square grid of n×n we draw for each possible pair (i, j) a box with area proportional to its probability. Finally, we produce a mountain plot depicting both the predicted mfe and pair probabilities. A mountain plot is an xy-graph that plots the number of base pairs enclosing a sequence position (for pair probabilities the average number of enclosing pairs). See Figure Figure11 for examples of all three representations.
Secondary structure drawing and dot plots are always produced in Postscript format. Postscript is used not only because it gives the highest print quality, but also because it allows the actual data to be embedded in the file, e.g. all pair probabilities are contained in the dot plot in an easy to parse format. On the other hand, Postscript files cannot be used for inline images on web pages and require additional software for viewing (e.g. gsview, http://www.ghostscript.com/).
A suitable alternative is the new standard for Scalable Vector Graphics, SVG (http://www.w3.org/Graphics/SVG). Users with SVG enabled browsers (typically through the use of Adobe's SVG plugin, http://www.adobe.com/svg/) can request structure drawings in SVG, which allows some interactivity such as toggling annotation. Currently the server accepts sequences up to a maximum length of 4000nt, sequences up to 300nt will be processed immediately while longer jobs are submitted to a batch queue, in which case the user is notified by email after completion.
The Alifold service predicts the consensus secondary structure for a set of aligned RNA or DNA sequences by using modified dynamic programming algorithms that add a covariance term to the standard energy model (11), again it supports prediction of mfe structures and pair probabilities. Usage is almost identical to that of the RNAfold service. Instead of typing an input sequence, a precomputed sequence alignment is uploaded via the input form. Currently, only alignments in Clustal format are accepted. The server restricts both the size of the upload and the length of the alignment, current limits being 10Kb and 2000nt, respectively.
Results are again visualized in Postscript plots that are enhanced by information on sequence variation. In the structure drawings mutations supporting the predicted structure are marked by circles, in the dot plots and mountain plots, color is used to indicate the number of different pair types. Examples and detailed explanation of these representations can be found on the online help page (http://www.tbi.univie.ac.at/~ivo/RNA/alifoldcgi.html).
Finding sequences that fold into a predefined structure is the inverse of structure prediction problem. Often it is useful to design such sequences, e.g. in order to experimentally test an hypothesis about functional structures. While this is often done manually for very short sequences, it quickly becomes tedious and error prone.
Our inverse folding service treats sequence design as an optimization problem in sequence space that is solved heuristically (12). There are again two variants based on mfe and partition function folding. In the first case we minimize the dissimilarity between the predicted mfe structure and the desired target structure. In the second case we optimize the frequency of the target structure in the thermodynamic ensemble. While the mfe optimization typically yields sequences that are marginally stable, i.e. have many alternative foldings, optimization via the partition function produces sequences with a very strong preference for the target structure.
Input consists simply of the desired structure in bracket notation. The maximum structure length is currently 100nt. The time needed for the search varies widely depending on the ubiquity of the target structure. Most valid secondary structure strings never occur as mfe structure of some sequence (i.e. many sequence design problems have no solution), while some others are extremely common (for example see 16). Conversely, the number of search steps performed by the algorithm is a good indicator for the frequency of a structure in sequence space.
The Vienna RNA secondary structure server presented here provides only basic access to a subset of the functions in the Vienna RNA software package. Nevertheless they provide a convenient interface for users that need RNA structure prediction only occasionally and a shallow learning curve for those new to the field.
The output web page produced by the server is designed for the interactive user and thus is not ideal for automatic parsing and further processing of the results. To facilitate such interoperation with other programs and web services we plan to offer input and output in a standardized data exchange format. A promising candidate for this is the recently proposed RNAML format (17), an XML based language for the storage of information on RNA sequence and structure.
While the server currently runs on a somewhat dated dual Pentium II 450MHz machine, the use of a batch queuing system allows jobs to be distributed to other machines should that become necessary.
This work is supported by the Austrian Fonds zur Förderung der Wissenschaftlichen Forschung, Projects FWF 15893 and P-13545-MAT.