|Home | About | Journals | Submit | Contact Us | Français|
Given an mRNA sequence as input, the OligoWalk web server generates a list of small interfering RNA (siRNA) candidate sequences, ranked by the probability of being efficient siRNA (silencing efficacy greater than 70%). To accomplish this, the server predicts the free energy changes of the hybridization of an siRNA to a target mRNA, considering both siRNA and mRNA self-structure. The free energy changes of the structures are rigorously calculated using a partition function calculation. By changing advanced options, the free energy changes can also be calculated using less rigorous lowest free energy structure or suboptimal structure prediction methods for the purpose of comparison. Considering the predicted free energy changes and local siRNA sequence features, the server selects efficient siRNA with high accuracy using a support vector machine. On average, the fraction of efficient siRNAs selected by the server that will be efficient at silencing is 78.6%. The OligoWalk web server is freely accessible through internet at http://rna.urmc.rochester.edu/servers/oligowalk.
It is well known that genes can be silenced by antisense RNA oligonucleotides called small interfering RNA (siRNA) (1,2). In order to design an efficient siRNA sequence, empirical rules based on the features of the siRNA sequence have been discovered, including, for example, low G/C content, lack of self-structure, preference of A at position 3, absence of G or C at position 19 and asymmetry in the stability of the terminal base pairs (3–10). The self-structure of the target and oligonucleotide is also an important consideration for the effective binding (11–15). It is desirable to select an oligonucleotide having high accessibility to the target-binding site and low duplex stability. Here, the OligoWalk server, which predicts efficient siRNA sequences using an accessibility calculation with a convenient web interface, is described. Overall, the positive predictive value of the server is 0.786, meaning that 78.6% of the siRNAs selected by the server will be efficient at silencing (16). The positive predictive value was determined by testing against a database of siRNA experiments conducted under diverse experimental conditions (17).
In the calculation of the OligoWalk server, unimolecular and bimolecular self-structures for the siRNA are considered along with unimolecular self-structure in the target at the oligonucleotide binding region (16). These structures are in equilibrium with each other and with the hybridized state. OligoWalk predicts the free energy changes (ΔG) involved in these equilibrium states (18). The predicted thermodynamics (ΔG), plus the oligonucleotide sequence features (19), are then utilized to predict siRNA efficacy for candidate siRNA sequences (16), which are generally 19 nucleotide duplexes with 3′ dinucleotide dangling ends (7). A support vector machine (SVM) program (20) is embedded in the server to take the thermodynamic and sequence features as input. The SVM classification model used in the server has been proven to be able to predict efficient siRNA (greater than 70% inhibition of the target mRNA expression) with high accuracy (16). The SVM was trained on a siRNA database that contains 2431 experimental results conducted in human cells at 37°C (10).
The input is the sequence of the target RNA. Advanced options are available for expert users to customize the calculation. The output of the OligoWalk server is a table of siRNA candidates, showing the siRNA sequences and the probabilities of being efficient (having silencing efficacy larger than 70%). Each of the free energy change terms for each candidate is also listed in a separate table.
The OligoWalk server uses the CGI (Common Gateway Interface) module of Perl for taking user input and submitting calculations from the homepage. The input of OligoWalk server is the RNA sequence of the target gene. Only A, U, T, G and C are the acceptable types of nucleotides in the sequence (the server will replace the nucleotide T with U for calculations), and the maximum sequence length is 10 000 nucleotides. An email address is required because the server sends an email to the user when the calculation is completed. Online help is available at the ‘Help’ hyperlink. When the user clicks, ‘Submit Query’, the server generates a list of efficient siRNA candidates for the target gene. Jobs are submitted by the server to a cluster of seven nodes with 3.2 or 3.4 GHz Pentium 4 processors running Fedora Linux (http://fedoraproject.org/), managed by Sun Grid Engine (http://gridengine.sunsource.net/). The default siRNA candidate is an RNA oligonucleotide having 19 nucleotides.
When the calculation is complete, an html (hypertext markup language) page is generated with links to tables containing predicted siRNA efficacy data and thermodynamic binding data. In the siRNA efficacy table (Figure 1), the sequences of siRNA candidates are ranked in the output list by their probabilities of being efficient siRNA. The probabilities are predicted by a SVM embedded in the web server for selecting efficient siRNA. The classification model (16) used in the SVM was trained with a publically available database (10), using thermodynamic and sequence features of siRNA candidates. The position number of each siRNA candidate is also listed in the table as the index of the 5′ most base in the target-binding region.
In addition, the predicted equilibrium thermodynamics table is generated as a reference for advanced users. In the table, the position number and sequence of each siRNA candidate appear with thermodynamic terms. ‘Overall’ (in kcal/mol) is the overall free energy change of oligonucleotide-target binding, when all contributions are considered, including breaking target and oligonucleotide self-structures (18). A more negative value indicates tighter binding. It is affected by the oligonucleotide concentration. ‘Duplex’ (in kcal/mol) is the free energy change of hybridized duplex between oligonucleotide and target (antisense–sense duplex), . The value is independent of oligonucleotide concentration because it is a standard free energy change. ‘Tm-Dup’ (in °C) is the melting temperature in degrees for the duplex formation of oligonucleotide and target. ‘Break-targ’. (in kcal/mol) is the free energy cost to open the intramolecular target base pairs for oligonucleotide binding, . A more negative number indicates higher free energy cost, which is unfavorable for oligonucleotide-target binding. ‘Intraoligo’ (in kcal/mol) is the free energy change of intramolecular oligonucleotide structure, . It usually has a negative value or, if there is no favor-able intramolecular structure, it is zero. ‘Interoligo’ (in kcal/mol) is the free energy change of intermolecular oligonucleotide structure, . A negative number indicates a stable antisense–antisense bimolecular structure, which decreases the oligonucleotide-target (antisense–sense) binding affinity. ‘End_diff’. (in kcal/mol) is the free energy difference between the 5′ and 3′ end of the antisense strand of siRNA, with windows of two base pairs. Functional siRNA prefer to have an unstable 5′ end (3), which means a positive End_diff. ‘Prefilter_score’ is the score calculated with a method based on the empirical rules by Reynolds et al. (7). All the scores are calculated in the same way as Reynolds et al. (7), except for the melting temperature of intramolecular oligonucleotide self-structure because the free energy (21) and enthalpy parameters (22) used by OligoWalk are more recent. When calculating the prefilter score, 57°C is used as the cutoff of the intramolecular oligonucleotide melting temperature, as suggested in another study (23).
As an example, the prediction of the webserver is compared with experimental results in Figure 2. In the experiment (3), siRNA were tested for efficacy against the target mRNA, Human Cyclophilin (Genbank ID: M60857), at 37°C. The inhibition efficacy of each siRNA is defined as 100% minus the percentage of mRNA level after siRNA application as compared to matched control. The prediction result is the probability of being efficient (having inhibition efficacy larger than 70%), which is calculated with the server. In Figure 2, most of the siRNA with high inhibition efficacy are predicted to have high probability of being efficient.
If the target RNA secondary structure is considered, three different prediction methods are available to calculate the free energy change of target self-structure. The first method is optimal structure prediction, where only the optimal structure (lowest free energy structure) of the target is considered to calculate the free energy cost of opening the base pairs of binding region. The second method considers a set of suboptimal structures to determine the free energy cost. Each structure's free energy cost is weighted according to the free energy change of the structure to arrive at the ensemble cost. For this option, at most 1000 suboptimal structures (within 10% free energy difference from the optimal structure) are generated with a heuristic method (24). The number of suboptimal structures will be listed in the output table if the target is folded with suboptimal structure prediction method. There are two columns of structure numbers in the output table. The first one is the number of target structures being predicted before oligonucleotide binding. The second one is the number of constrained target structures. Constrained target structure is the refolded structure where the binding region is forced to be single-stranded, so that the oligonucleotide can bind to it. The final and default option is a partition function calculation (25). This is the most rigorous method because it considers every possible secondary structure in the folding ensemble, with Boltzmann weight.
The structure prediction only folds a certain total number (folding size) of nucleotides centered at the binding region. The user can define this number, but the largest folding size is 1000 nucleotides for the webserver in order to save compute time. Users can define longer folding sizes by downloading and installing the OligoWalk program to a local machine. A prefilter based on the scoring method by Reynolds et al. (7) can be used to rule out nonefficient siRNA candidates before folding the target sequence, i.e. the siRNA sequences having score less than six points will not be considered for the folding step. It is suggested to turn on the prefilter option to save considerable computation time (Table 1). Furthermore, the scan region can be redefined if the user is interested only in a specific region of the target.
The OligoWalk web server predicts the hybridization thermodynamics of an oligonucleotide binding to a complementary target RNA using the most recent RNA folding parameters (21, 22). It predicts efficient siRNA with high accuracy using a transparent implementation of an SVM (16), which considers both sequence and thermodynamic features. The calculation time and memory size of OligoWalk are shown in Table 1 for a sample of mRNA sequences. The prefilter (7) that uses local sequence information to narrow down the list of siRNA candidates before calculating the equilibrium affinity is used by default. Its use is recommended because the calculation of the partition function is time consuming. For example, the server takes 3 h and 43 min for a complete scan of all possible siRNAs on an mRNA having 730 nucleotides using the partition function calculation. For the same sequence, the time cost is only 57 min when the prefilter is turned on. The algorithm time scales O(mN 3) and the memory use scales O(N2) (Table 1), m is the number of candidates and N is the value of folding size. The time and memory costs change little with sequence because the same folding size (e.g. 800 nucleotides) is used and the prefilter (7) is turned on, which limits the number of candidates to be folded in a way that is apparently independent of target length.
There is currently significant interest in using siRNA for both basic science and medical research. The fact that not all siRNA duplexes will function in silencing means that there is a significant cost in trial and error for siRNA design. The OligoWalk server for siRNA design can mitigate this cost.
The design of the server was supported by the National Institutes of Health with grant R01GM076485 to D.H.M. Funding to pay the Open Access publication charges for this article was provided by the National Institutes of Health.
Conflict of interest statement. None declared.