|Home | About | Journals | Submit | Contact Us | Français|
The design of RNA interference (RNAi) reagents is an essential step for performing loss-of-function studies in many experimental systems. The availability of sequenced and annotated genomes greatly facilitates RNAi experiments in an increasing number of organisms that were previously not genetically tractable. The E-RNAi web-service, accessible at http://www.e-rnai.org/, provides a computational resource for the optimized design and evaluation of RNAi reagents. The 2010 update of E-RNAi now covers 12 genomes, including Drosophila, Caenorhabditis elegans, human, emerging model organisms such as Schmidtea mediterranea and Acyrthosiphon pisum, as well as the medically relevant vectors Anopheles gambiae and Aedes aegypti. The web service calculates RNAi reagents based on the input of target sequences, sequence identifiers or by visual selection of target regions through a genome browser interface. It identifies optimized RNAi target-sites by ranking sequences according to their predicted specificity, efficiency and complexity. E-RNAi also facilitates the design of secondary RNAi reagents for validation experiments, evaluation of pooled siRNA reagents and batch design. Results are presented online, as a downloadable HTML report and as tab-delimited files.
Functional genomic studies by RNA interference (RNAi) are now widely used to determine gene functions in a variety of cell-based and in vivo model systems (1). RNAi is triggered by exogenous double-stranded (ds) RNAs causing the degradation of endogenous messenger RNAs (2). The conservation of RNAi from plants to humans (3) allows its application in a broad spectrum of model organisms for which genome sequences and gene annotations are available.
RNAi experiments in invertebrate model systems mainly use long dsRNAs (100–700-bp long) (4) that are intracellularly cleaved into 19–23-bp long siRNAs by Dicer (5). In vertebrate cells, siRNAs (21-nt duplexes) are used to trigger mRNA knock-downs (6). Over the past years, many sequence properties affecting the silencing specificity and efficiency of siRNAs have been identified. Unintended effects can for example be caused by the overall homology of siRNAs to ‘off-target’ mRNAs (7), miRNA-like seed regions contained within siRNAs (8,9), and low-complexity regions (e.g. CA[ACGT] or CAN repeats) (10). Sequences that contain such motifs should be avoided in the design of long dsRNAs and siRNAs. Key parameters that influence the efficiency of introduced siRNAs were discovered in optimization studies in mammalian cells, including differential end stability of siRNAs, GC content, unstable or no secondary structures of the siRNAs and their targets as well as several base preferences (11–13).
Considering the potential pitfalls of RNAi experiments resulting e.g. from unspecific effects or poor knock-down efficiency by RNAi reagents (14), computational tools are necessary to enable the optimized design and evaluation of long dsRNAs and siRNAs in a user-friendly manner. A limited number of web-based tools are available for the design of long dsRNAs (e.g. DEQOR (15), SnapDragon at http://www.flyrnai.org/). The collection of online services available for the design of siRNAs is diverse including commercial design tools [e.g. siDESIGN Center (Dharmacon, ThermoScientific), BioPredsi (16), SVM siRNA design tool (13)] and tools that are focused on certain siRNA properties such as the prediction of target-site accessibility [RNAxs (17)] or siRNA efficiency [sIR (12)]. However, most tools are dedicated to a certain model organism, have not implemented batch queries or evaluation of pre-existing RNAi reagents.
Here, we present an updated version of the E-RNAi web service (18) that has been improved by the implementation of new design options and prediction of RNAi reagents for a spectrum of model organisms. E-RNAi now covers 12 model organism that are tractable by RNAi (Table 1). The flexible data structure behind E-RNAi allows for the addition of further organisms. Besides long dsRNAs, siRNA reagents can be designed or evaluated, including a feature for the summarized analysis of siRNA pools. In addition, the user can visually select target regions for the de novo design of RNAi reagents from a genome browser interface [GBrowse (19)]. Once a target region is selected, different options such as specificity and efficiency parameters or primer-design settings can be adjusted. Additional quality measurements of RNAi reagents are available, including the evaluation of long dsRNA for low-complexity regions (e.g. CAN repeats), imperfect homologies to other target regions and recent improvements in efficiency calculation methods.
In addition, E-RNAi allows the straightforward design of independent RNAi reagents e.g. by submitting sequences to be excluded. The new version of E-RNAi incorporates extended reports, including the ability to export results as generic feature format (GFF) and annotation file format (AFF) to facilitate their visualization in a genome browser. A full report that contains all results can be downloaded as archive file and locally displayed in any web browser. In addition, the implementation of new alignment tools [Bowtie (20), BLAT (21)] significantly improves the speed of E-RNAi. Batch queries with up to 50 designs can be analyzed in one query.
The E-RNAi web service automates the steps required for the design and evaluation of RNAi reagents. Until now, queries for 12 genomes have been implemented which can be extended upon the request of users (Table 1). To design or evaluate RNAi reagents, E-RNAi performs several tasks, including: (i) defining target sequences and design options based on user-input, (ii) filtering of low-complexity regions from input sequences, (iii) in silico dicing of input sequences into all possible siRNAs, (iv) evaluation of predicted specificity and (v) predicted efficiency for each siRNA, (vi) design of optimal PCR primer pairs for all regions meeting criteria (ii–v) (for long dsRNAs only), (vii) sorting amplicons or siRNAs for specificity and efficiency, (viii) evaluation of reagents’ overall homology to other transcripts, (iv) mapping of reagents to the genome and (x) generation of HTML and tab-delimited reports. In case the user-defined criteria were not met (steps ii–vi) a re-design method (if enabled) will relax criteria and re-start the design process (at step vi). The general workflow and database structure of E-RNAi are shown in Figure 1.
E-RNAi offers three run options: ‘ID or sequence input’, ‘GBrowse input’ and ‘evaluation of RNAi reagents’. All are accessible from the entry webpage. The first two options allow the selection of target sequences for the de novo design of reagents. The third option enables the evaluation of existing RNAi reagents. All input options are available for long dsRNAs and siRNAs, and for all implemented genomes (red panel, Figure 1).
E-RNAi accepts sequences in FASTA or raw formats. The sequences are first mapped to the genome by a local Blat server (21) and returned to the user for target region selection. In case gene identifiers were queried E-RNAi lists sequences for the gene, annotated isoforms and exons (see Table 1 for examples). This also facilitates the selection of exons common to all transcripts (Figure 2A). A more detailed and interactive way to select target sequences is offered by the option to select target regions through a genome browser (GBrowse) interface (19). E-RNAi is connected to a local GBrowse installation where specific regions can be searched by gene-identifier or by absolute genome coordinates. Individual sequences can then be visually selected and submitted to E-RNAi as design templates (Figure 2B).
In case an organism is not available in E-RNAi, the web service can be run without the definition of an organism (‘No off-target database’). The design and evaluation of RNAi reagents is then solely based on efficiency and low-complexity predictions (as well as on primer designs).
dsRNA sequences have been shown to exert off-target effects via regions of low sequence-complexity (e.g. CAN repeats (10)) or homology of contained siRNAs to unintended targeted transcripts (7). E-RNAi avoids low-complexity regions by applying the mdust filter program (http://compbio.dfci.harvard.edu/tgi/software/) on the queried sequences. Mdust is used with default parameters that will exclude e.g. most simple nucleotide repeats and poly-triplet sequences. Additionally, sequences are filtered for stretches containing more than six contiguous CAN repeats. E-RNAi also ‘dices’ all queried sequences in siRNAs of a user-defined length (16–28 bp) using a 1 bp shifting window. These are mapped to the transcriptome using Bowtie (20) to assess their specificity. siRNAs that target multiple independent transcripts are flagged. The user can also provide a FASTA sequence file (e.g. containing 3′-UTR sequences) that is used by E-RNAi to calculate the number of siRNA seed-matches. siRNAs with a seed-match frequency above a user-defined cutoff are flagged.
Additionally, designed reagents are evaluated for partial homologies to unintended targets by Blast (22) searches against the transcriptome with a user-defined E-value cutoff.
E-RNAi implements two scoring methods to predict the efficiency of siRNAs. The so-called ‘rational’ (11) and the ‘weighted’ (12) scoring methods were both developed from optimization studies using siRNA experiments in human cells. All siRNA features that are included in the score are listed in detail in Supplementary Table S1. Since the scoring methods result in different ranges, we normalized scores to a range between 0 and 100. According to the normalized score, Reynolds et al. (11) considered siRNAs with scores ≥66.7 as efficient silencers and Shah et al. (12) found siRNAs with scores ≥63 to be the potent. One of the main differences between both methods is that Reynolds et al. (11) suggests the absence of ‘internal repeats’ (the melting temperature of potential hairpins should be below 20°C) a significant feature, whereas Shah et al. did not. E-RNAi assesses potential secondary structures of siRNAs for the ‘rational’ method using the Vienna RNA package (23).
In case of siRNAs, the calculated efficiency score directly refers to the efficiency of the sequence. siRNAs that do not match the user-defined efficiency criteria are flagged during the design.
Long dsRNAs contain many different siRNAs and scores are aggregated during processing. E-RNAi reports two scores for the efficiency of long dsRNAs: an average efficiency score of all contained siRNAs and the absolute number of efficient siRNAs (efficiency above the default or user-defined cutoff).
The amplification of DNA templates for long dsRNAs by PCR from genomic or cDNA sources is a required step during the synthesis of long dsRNAs. E-RNAi uses primer3 (24) for the identification of suitable primers. The primer design can be influenced by user preferences (Table 2). We have found that the standardized primer design with primer3 facilitates similar PCR synthesis efficiency and the selection of smaller windows for the ‘amplicon size range’ (resulting in designs of comparable lengths) facilitates similar in vitro transcription reactions (data not shown). T7 or SP6 promoter sequences for in vitro transcription (or any other, individual tag) can be automatically added to the primer designs. The E-RNAi web service outputs a ready-to-use list for the synthesis of long dsRNAs.
During the design of long dsRNAs, E-RNAi identifies suitable target sites by filtering the input sequences for regions that fulfill the default or user-defined criteria for predicted specificity, efficiency and complexity. These ‘favorable’ regions are used for automated primer design. If no suitable primer designs can be found, E-RNAi connects the closest, ‘favorable’ neighbors (if enabled by the user). This step is repeated until the design of primers was successful. In the final scoring of all possible dsRNAs, the ‘preferred’ reagent is determined by ranking all successful designs according to their predicted specificity and efficiency. Specificity of a long dsRNA is calculated as a percentage of siRNAs targeting the intended transcripts. Efficiency refers to the absolute number of efficient siRNAs (user-defined) contained in the long dsRNA.
The evaluation of RNAi reagents implements the same procedures as used for the design. E-RNAi reports e.g. how many siRNAs are contained in a long dsRNA predicted to be unspecific or how many siRNAs have an efficiency score above the user-defined cutoff.
As for the design of long dsRNAs, the design of siRNAs also implements filters to avoid regions that do not fulfill the user-defined criteria regarding predicted specificity, efficiency and complexity. siRNAs are finally ranked for specificity and efficiency.
Independent RNAi reagents are a necessary experimental step to confirm RNAi phenotypes. The updated version of E-RNAi offers several tools for their design. The user can upload sequences of pre-existing reagents (as a FASTA file) that will be excluded in the design process. Alternatively, independent target regions can be selected from sequence and gene-identifier queries or through the GBrowse interface (Figure 2).
E-RNAi results are presented as a webpage that can also be downloaded and locally viewed in any web browser. Linked pages provide detailed information for each design such as its sequence, primer and target information. An example result page for RNAi reagents targeting the Drosophila gene twi is shown in Figure 3. Information of RNAi designs is also provided as tab-delimited files, FASTA sequence files as well as GFF and AFF files.
The design of specific and potent long dsRNAs and siRNAs is of key importance for RNAi experiments. The E-RNAi web service implements a user-friendly and straightforward selection of RNAi design templates and implements all steps required for the prediction of specific and efficient RNAi reagents. Moreover, available designed reagents can be evaluated and re-annotated using E-RNAi or used as sequence to be avoided for independent design.
In addition to classical model organisms such as Caenorhabditis elegans and Drosophila melanogaster, E-RNAi incorporates organisms that were only recently sequenced and annotated. Experiments using RNAi in these organisms can benefit from the knowledge about e.g. quality parameters of RNAi reagents obtained from studies in classical systems.
The update of E-RNAi also implements new findings with regards to RNAi specificity and efficiency, including analysis of sequences for low-complexity regions or improved efficiency predictors. We will continue to update the web service as new results become available. Technically, the new database structure behind the web server greatly facilitates the addition of organisms where genome and transcriptome sequences are available. The implementation of faster alignment tools (primarily Bowtie) significantly shortened the time to calculate designs and was also important to allow batch queries. The maximal size for batch queries will be further increased in the future, offering the possibility to e.g. re-annotate complete RNAi libraries online.
A comprehensive documentation of all E-RNAi input and output features as well as a detailed description of all settings is available in the E-RNAi Wiki at http://www.e-rnai.org/wiki, including tutorial sections e.g. about the calculation of independent designs.
Supplementary Data are available at NAR Online.
Studienstiftung (to T.H., PhD fellowship); Deutsche Forschungsgemeinschaft and the Helmholtz Association (partial); EC FP7 Program (Grant 201666). Funding for open access charge: Intramural funding.
Conflict of interest statement. None declared.
We are grateful to Thomas Sandmann for critical comments on the manuscript.