This computational study of mechanistic aspects of RNAi against the background of extensive transcriptome and genome information available in the nematode C.elegans
(human) and (to a lesser extent) for S.pombe
(fission yeast), indicated a considerable likelihood that the specificity of RNAi knockdown is compromised by off-target RNAi effects. The similarity of observations from organisms of a wide phylogenetic range (fungi to both protostome and deuterostome animals) suggests that the conclusions from our analyses may provide insights into general aspects of RNAi. The results reported here were derived from computational approaches only; the feasibility of experimental validation is compromised by the large (genome-size) scale of the sequence data considered in these analyses. However, the parameters used for the computational analyses were applied at a high stringency compared with the conditions that allow RNAi in vivo
. For instance, only sequence identity and minimal mismatch were considered to define siRNA specificity for a target sequence, bulge or wobble phenomena that relax sequence-specific target recognition by siRNA (22
) were not allowed for. In addition, with a computational approach it was feasible to test parameters (such as lengths of dsRNA and especially siRNA) beyond the naturally occurring ranges to examine properties and trends of RNAi specificity. Our work does, however, suggest an empirical investigation that would be informative about both in vivo
off-target rates and the properties of the siRNA binding/knockdown process. In each of the organisms studied here, we have identified a number of siRNA that have the highest potential for off-target effects, along with the predicted affected genes and predicted efficacy according to the rational design rules (http://rnai.cs.unm.edu/rnai/off-target/
). An in vivo
study of the knockdown produced by some or all of these siRNAs with regard to the putative affected genes and controlled by monitoring predicted non-target genes (measured, e.g. by microarray analysis) could reveal whether the predicted off-target effects do, in fact, occur in living systems. The rates of off-target knockdown would help to calibrate the predicted rates in this paper. Furthermore, such an experiment would contribute evidence toward the current debate of whether or not efficacy is purely a function of the siRNA or is also dependent on the target molecule (45
The algorithm used to detect sequence similarity as parameter for off-target RNAi was designed specifically for use with short (siRNA) sequences, while it also incorporated the use of dsRNA as a source for siRNA sequences. Thus, the analyses are relevant for two ways to experimentally affect RNAi, introduction of dsRNA and siRNA (18
). The algorithm is superior to the BLASTN algorithm (25
), which is usually recommended to evaluate potential off-target properties of siRNA and dsRNA designs toward other genes (26
). While BLAST search offers some protection against off-target effects, and is certainly better than no control whatsoever, it is not, by itself, sufficient for general use for at least two reasons. First, the BLAST homology function was not particularly developed to model the RNAi-binding process and does not account for some of its known features. For example, mismatches and bulges are known to have differential effects on efficacy, varying along the length of the siRNA (22
). Although BLAST allows for mismatch, insertion and deletion based on alignment cost, it cannot control the positions of these imperfect match patterns. Our algorithm is capable of modeling these patterns by controlling the length and positions, allowing it to detect off-target effects that would be missed by BLAST searches. Second, BLAST is suitable only when the entire genome sequence is available—in the absence of complete genome information, it is possible that significant off-target interactions will be missed. Although we can also only search complete genomes in the current work, we have quantified expected off-target error rates in a number of organisms, establishing a range of probable off-target rates. In an otherwise unsequenced organism, these bounds can be used to estimate the probability of off-target effects based on comparison of its genome size and evolutionary history. They can also be used to ameliorate such effects through multiple trials with varying siRNA selected from the target gene. Using off-target framework built in this work, we are able to develop quantitative models to predict off-target errors by incorporating a number of variables such as genome size and chromosomal location of a target gene in addition to the parameters we used in Materials and Methods. These models will provide reliable prediction of false positive error rates when an organism is partially sequenced.
We should note that our predictions neither include the effects of siRNA concentration nor do they attempt to account for the non-linear (synergistic or mutually interfering) interactions of a pool of siRNA. It is clear that both these effects are of critical practical consequence and that a computational model supporting them is desirable. At the moment, however, there is insufficient published data on the efficacies of pools to be able to construct a high-confidence model of pool effects. From some reports (15
) it is clear that simplistic models, such as linear combinations weighted by concentration, are inadequate. Thus, the results in this paper do not attempt to model either concentration or non-linear siRNA pool effects. Our results should, therefore, be interpreted as the chance that any single siRNA arising from a chosen dsRNA has a chance of off-target interaction within the genome. In practice, this may be an overestimate of true off-target effects, but it does still provide an indication of off-target genes that should be monitored for potential off-target repercussions.
Remarkably, the examination of RNAi off-target error as a function of siRNA length disclosed that siRNA sequences of 21 nt, the length most observed in vivo
, optimally balanced target specificity and low chance of off-target RNAi. siRNA sequences of <21 nt had increased chance for off-target effects whereas longer sequences did not gain adequate target specificity to significantly reduce off-target reactivity. This siRNA length effect suggests that the chance for off-target RNAi effects may increase with the use of artificial siRNA sequences of <21 nt, such as 12–15 nt dsRNA fragments that result from RNase III digestion of dsRNA (48
). The protozoan parasite Trypanosoma bruci
employs comparatively long siRNA (24–26 nt) to target RNAi (39
), perhaps for the benefit of gaining some critical specificity of RNAi. However, sufficient sequence data are lacking at this time to validly investigate the off-target dynamics for siRNA of various lengths in this organism.
Despite inherent properties that combine optimally for specific sequence-based recognition, 21 nt siRNA still have a considerable chance for off-target effects when considering all coding domains within a transcriptome. Not surprisingly, the incidence of off-target effects increased when sequence mismatch of up to nine consecutive residues between the siRNA and the potential targets was allowed for. Varying the position of these mismatches within the siRNA sequence changed the number of potential target sequences. Consistent with experimental observations (22
), we found that off-target error rates corresponding to mismatches within the region of 2–9 nt at the 5′ end of the guide strand were significantly lower.
The off-target effects also increased following inclusion of upstream and downstream UTR sequences within the target sequences, to reflect the in vivo
reality that complete mRNA transcripts (not just the protein-encoding sequences) can be attacked by RNAi. Although this increase was not significant (), this result suggests that the nucleotide usage in UTRs substantially differs from that in coding regions. Finally, off-target errors increased when using longer versus shorter lengths of dsRNA to generate the siRNA population. Intriguingly, our methods showed that dsRNA representing the region of the first 100 nt of the 5′ terminus of CDSs yielded the lowest chances for off-target effects. This particular region may differ between genes for proteins that function intracellular versus proteins that are released extracellularly. The 5′ sequence of the latter category of genes encodes for signal peptides or membrane anchors. The specific sequence constraints that ensure functionality of these domains (49
), may subdivide the transcriptome into smaller populations of target sequences. Although the beginning of the CDS has lower off-target error, this region is not recommended for dsRNA design because it is rich in regulatory protein-binding sites (26
). There is also an empirical evidence that Dicer produces more siRNA toward the 3′ portion of the target gene (39
). In all, the reduction in off-target error was not significantly different for dsRNA from the first 100 nt versus dsRNA representing residues 100–200 nt of CDSs.
Combined, the above computational findings suggest an extensive potential for off-target effect of RNAi experiments. However, in practice, chances for off-target errors may be less severe. RNAi targets mRNA for destruction and can only knock down genes that are expressed when siRNA is present. Potential off-target genes (that have adequate sequence identity to siRNA) will not be affected if they are not expressed simultaneously with the intended target gene. Our analysis showed that relatively few siRNA targets a sequence that is repeated frequently throughout the transcriptome of each of the organisms tested. In fact, siRNA designs can be screened for this property (http://rnai.cs.unm.edu/rnai/off-target/
) to avoid the use of siRNA with increased chance for off-target errors. Moreover, we determined the chance for off-target error for each gene within the transcriptome of C.elegans
relative to its position on the physical map of the genome of this nematode. CDSs from chromosome regions that contain more densely packed genes had a lower probability for off-target RNAi, as observed in all chromosomes except chromosomes IV and V. This implied that densely packed genes generally employ more unique sequences within the genome of C.elegans
. Regardless, once a physical map is available for an organism, it may be possible to correlate the need to consider RNAi off-target error for a particular gene with the location of that gene within the genome. In addition, the results of the combined analysis suggest a trend where the chance for the off-target error is elevated for larger genomes. Of note, C.elegans
have roughly the same proportions of unique siRNAs (), but the off-target error rate in H.sapiens
was much higher (). Sequence comparison showed that transposable elements may not be the source of these frequent 21mers, and the true origin remained to be determined.
Finally, several properties of siRNA sequences have been found to be associated with a high efficacy to cause RNAi. For instance, the relative thermodynamical stability of the sequence termini may determine how a double-stranded siRNA dissociates to correctly incorporate the negative RNA strand into the RISC complex (43
). Such properties have been combined into rational design methods for improving the siRNA efficacy (43
). Implementation of rational design yielded a considerable reduction in the number of functional siRNA sequences derived from the transcriptomes of H.sapiens
, thereby reduced likelihood for off-target error. Statistical analysis showed that minimizing off-target error and enhancing siRNA efficacy can be performed independently.
In summary, experimental RNAi targeted by siRNA has a certain degree of specificity. However, off-target effects yielding unintentional knockdown of unrelated genes are probable. The random occurrence of some level of sequence identity (including imperfect match) between siRNA and multiple targets in a transcriptome contributes to this undesired effect. The computational methods applied here may underestimate the off-target effects because of fairly stringent matching of sequence identity. Further studies will consider more relaxed rules for siRNA–target interaction such as bulge and wobble effects that occur in vivo. Although off-target effects can be reduced by minimizing sequence similarity with known transcripts and by rational design, it is recommended to include controls for specific targeting in RNAi experiments. Further understanding of siRNA will lead to more precise targeting of RNAi and reduce off-target effect to benefit the study of gene function and other future applications of RNAi.