|Home | About | Journals | Submit | Contact Us | Français|
The discovery of RNA interference (RNAi) generated considerable interest in developing short interfering RNAs (siRNAs) for understanding basic biology and as the active agents in a new variety of therapeutics. Early studies showed that selecting an active siRNA was not as straightforward as simply picking a sequence on the target mRNA and synthesizing the siRNA complementary to that sequence. As interest in applying RNAi has increased, the methods for identifying active siRNA sequences have evolved from focusing on the simplicity of synthesis and purification, to identifying preferred target sequences and secondary structures, to predicting the thermodynamic stability of the siRNA. As more specific details of the RNAi mechanism have been defined, these have been incorporated into more complex siRNA selection algorithms, increasing the reliability of selecting active siRNAs against a single target. Ultimately, design of the best siRNA therapeutics will require design of the siRNA itself, in addition to design of the vehicle and other components necessary for it to function in vivo. In this minireview, we will summarize the evolution of siRNA selection techniques with a particular focus on one issue of current importance to the field, how best to identify those siRNA sequences likely to have high activity. Approaches to designing active siRNAs through chemical and structural modifications will also be highlighted. As the understanding of how to control the activity and specificity of siRNAs improves, the potential utility of siRNAs as human therapeutics will concomitantly grow.
Issues of formulation, stability, delivery, and specificity are crucial for the development of any therapeutic. Nonetheless, it is essential to begin therapeutic development with the most active molecule possible in order to enable use of the minimum dose to achieve a therapeutic effect. For short interfering RNA (siRNA)-based therapeutics, identifying the most active sequences requires a thorough understanding of the molecular-level details of the RNA interference (RNAi) mechanism. Implicit in this understanding is that we would know what chemical and physical features of the siRNA are important for maximal activity. However, to date, many details remain unclear. Here, we will briefly review how siRNA selection approaches have become more sophisticated as mechanistic details have emerged and how further analysis of existing and new data can provide additional insights into further refinement of these approaches. We will conclude with a discussion of how chemical and physical manipulations can be used to enhance the activity of a selected siRNA sequence.
Since the discovery and characterization of RNAi in C. elegans , the broad mechanistic details for the pathway have been largely characterized. Unlike C. elegans, longer double-stranded RNAs (dsRNAs) cannot be used to initiate RNAi in mammalian cells due to the innate immune response . Therefore, siRNAs are used to initiate RNAi [3, 4], though these still have potential immunogenicity (see the companion minireview by Samuel-Abraham and Leonard  and [6, 7]). Nonetheless, siRNAs remain the most viable candidates for application of RNAi as a human therapeutic approach.
The basic mechanism for siRNA-initiated RNAi in humans is as follows. siRNAs are first delivered to the cytoplasm of the cells of interest, a non-trivial task, particularly in vivo (see companion minireview by Shim and Kwon  and ). The siRNAs are then recognized by the proteins of the human RISC loading complex (RLC), Dicer, Argonaute 2 (Ago2), and TAR RNA binding protein (TRBP) . The RLC then selects one of the two strands to act as the guide strand [11–13], yielding the active RNA induced silencing complex (RISC), which contains at a minimum the single-stranded guide strand RNA and Ago2 [14, 15]. (Recent work suggests that the human RLC may not function in an exactly identical manner to the Drosophila RLC , suggesting that careful study of the human system and proteins is essential for the development of therapeutic siRNAs for human disease.) RISC then recognizes its target mRNA by complementarity between the guide strand and a region on the mRNA, cleaving the mRNA at the center of the region of intermolecular hybridization . Silencing results from the normal degradation of previously expressed protein, which cannot be replaced due to the reduced levels of intact mRNA. Thus, RNAi provides a powerful tool for inhibiting the expression of any protein product of relatively short half-life (< 12 h) whose expression level is primarily controlled by transcription rate.
It is important to note that the use of siRNAs for transient control of gene expression leverages an endogenous cellular control mechanism that is naturally used by microRNAs (miRNAs) [17–19]. As such, siRNAs do not have to activate a new pathway to function. While there is some concern that this machinery could be diverted from its normal roles by saturation with exogenous siRNAs , this would likely be a concern only for a chronic therapy, for which short, hairpin RNAs (shRNAs), which are related in structure and function to both siRNAs and miRNAs, are more suitable . For further information about miRNAs, their roles in gene expression control, and their unique characteristics relative to siRNAs, see .
As the field of RNAi has grown, the rules for selecting candidate siRNA sequences have become more complex. The initial selection of agents for RNAi was based on complementarity of one strand of the dsRNA to the target mRNA. Subsequent to the discoveries of Dicer and siRNAs, it became clear that the structure of the siRNA, with the internal 19 nt hybridized and 2 nt overhangs at each 3′-end (typically UU or TT), was also important for recognition by the pathway proteins. These structural considerations were combined with uniqueness of the target sequence within the known transcriptome of the organism and the simplicity and purity with which the selected sequence could be synthesized to serve as the initial design considerations for siRNAs (see Figure 1 for other possible design variables for siRNAs that can be considered).
Another critical feature that subsequently came to light was that siRNAs must possess a 5′-PO3 rather than a 5′-OH , which is the typical terminal group for chemically synthesized siRNAs. This 5′-PO3 group is important for recognition of the siRNA by Dicer, as 5′-OCH3 and modified strands are bound far more weakly than phosphorylated strands [24, 25]. Fortunately, unphosphorylated siRNAs are rapidly phosphorylated by Clp1 upon entry to the cytoplasm . As such, modifying the passenger strand with a 5′-methoxy group can prevent its phosphorylation and therefore prevent incorporation into RISC .
It soon became clear that not all siRNAs silenced their target with the same efficiency, so the rules for selection of active species were strengthened by the generation and analysis of data on large sets of siRNAs [28–31]. The earliest rules were focused on the siRNA alone with positional base preferences being the dominant factors [28, 31]. It was also proposed that using siRNAs where the guide strand would not form a stable secondary structure would be preferred [32, 33], though this remains in question . The accessibility of the target region on the mRNA, as determined by mRNA secondary structure prediction [35–37], has also been found to be important in determining siRNA function [38, 39]. In general, we found that having accessibility at the 5′-end and 3′-end of the target region, based on the minimum free energy structure prediction, was preferred to accessibility in the center of the target or no accessibility, with the effect being independent of guide strand structure . Regardless of the method used for secondary structure prediction, it is clear that accounting for the target secondary structure is valuable in selecting siRNAs with maximal activity. This is similar to what had been found for the effect of mRNA structure on antisense oligonucleotide activity [40–42].
Because siRNAs are double-stranded, either strand is capable of serving as the guide for active RISC. Thus, to maximize the activity of siRNAs, it is advantageous for one of the two strands of the siRNA to be loaded preferentially into RISC. The preference for loading one strand over the other is referred to as siRNA asymmetry. Based on early studies in Drosophila, it was proposed that siRNAs were asymmetric due to the difference in the hybridization free energy for the terminal four nucleotides on each end of the siRNA [12, 43]. The strand whose 5′-end was located at the less stably hybridized end of the siRNA would preferentially be loaded into active RISC. This was confirmed using sequences with terminal mismatches to induce significant instability at one end of the siRNA. Subsequently, thermodynamic asymmetry was confirmed to be a useful predictor of siRNA function .
Though the existence and importance of asymmetry are not in question, the appropriate method for prediction of asymmetry has since received considerable attention, with two primarily parallel viewpoints, terminal sequence or terminal stability, being adopted. With fully-hybridized siRNAs, the thermodynamic stability of the termini is a function of the terminal sequence. Therefore, either the sequence (as suggested by analyses of positional base preferences) or the stability (as suggested by thermodynamic calculations) or both can be the driving force for asymmetry. Moreover, the strategy for calculating thermodynamic asymmetry is not fully settled [44–46], in particular how many base pairs/nearest neighbors to take into account in the calculation. Our previous analyses suggested that ultimate silencing activity could be reliably predicted by simple classification of the 5′ nucleotides on each strand . Supporting the contributions of terminal sequence to eventual function, biochemical and structural studies have demonstrated preferences in terminal nucleotide identity for RNA binding and processing by Dicer and Ago2 [47, 48]. It is therefore beneficial to analyze further the relative utility of terminal thermodynamics and terminal nucleotide sequence for predicting eventual siRNA function.
To do this, we analyzed two available databases of siRNA function [30, 31]. Using information theory, we analyzed the reduction in entropy in the activity data when using terminal nucleotide classification (as in ) vs. using the ΔΔG calculations with 1, 2, 3, or 4 terminal nearest neighbors. A reduction in entropy indicates a reduction in the scatter of the data and hence a useful predictor of the data. Examining each variable, all five prediction strategies provide predictive information (Table 1), with the terminal nucleotide classification providing the best predictive accuracy for both datasets. It is worthwhile to note that, among the ΔΔG calculations, using only the terminal nearest neighbor on each end of the siRNA provides the best predictive accuracy (Table 1, bold), echoing the results of others . However, in all cases, the entropy reductions, though statistically significant (via bootstrapping, data not shown), are rather modest.
We therefore examined the entropy reduction when using both terminal nucleotide classification and ΔΔG calculation to predict siRNA function (Table 2). Use of both approaches greatly reduces the entropy in all cases in a synergistic fashion. The independent information that is possessed by each classification mode is seen in the low (near zero) redundancies between terminal nucleotide classification and any of the ΔΔG calculations (Table 2, column 5). Interestingly, when using the terminal nucleotide classification as well, the best predictive accuracy for both datasets was achieved with the 3 nearest neighbor ΔΔG calculation (Table 2, bold). This analysis, which is consistent between the two datasets, shows that predicting siRNA function using classification by both sequence and asymmetry in terminal stability provides greater accuracy than using either technique independently. This point is emphasized when examining the data sorted using the nucleotide classification and 3 nearest neighbor ΔΔG calculation (Figure 2). There are clear and distinct trends both horizontally and vertically, making those sequences that appear in the upper left-hand corner of the figure most likely to be highly active. This further supports that terminal sequence and terminal stability provide unique, useful information for predicting siRNA activity.
Recent results suggest that highly active siRNAs are likely to have lower internal stabilities than less active siRNAs . Lower internal stabilities were found to be indicative of lower siRNA GC content and limited secondary structure for both the target and guide strand, all of which are known to be important factors in maximizing function. Other results showed that the internal stability of siRNAs can vary along their length . As it is known that the passenger stand of the siRNA is cleaved by Ago2 to free the guide strand , the profile of variable internal stability may reflect that the center of the siRNA must be hybridized stably to allow cleavage of the passenger strand by Ago2 but that both 3′-ends of the cleaved passenger strand should be relatively unstable to encourage separation from the guide strand after cleavage .
Ultimately, the best siRNA sequence will be determined by the sequence and structure of the target as well as the sequence, structure, and asymmetry of the siRNA. However, once the best siRNA sequence has been selected, the function of the siRNA can still be enhanced through the incorporation of a variety of chemical and structural modifications that improve the performance of the siRNA relative to a particular design variable, e.g., biological half-life. These modifications will be important for generating siRNAs with in vivo efficacies and specificities that are sufficient for therapeutics. The variables that can be manipulated include altering the nucleotide chemistry in the ribose, base, or phosphates; varying the length of the siRNA; and altering the overhang sequence, structure, and chemistry. The first priority for nucleic acid based therapeutics, especially RNAs, is maintaining their integrity in the presence of ubiquitous nucleases. Many chemical modifications that mitigate degradation by RNases have been examined [50, 51], many of these having first been used in the development of antisense oligonucleotides [52–56]. Other important details that can be manipulated using chemical modifications include strand selection, off-target effects, and cellular distribution, as reviewed in .
Structural modifications can also be effective at altering siRNA properties. While typical siRNAs have a 19 nt paired region followed by a 2 nt 3′ overhang, longer and shorter siRNAs have been shown to be active at initiating silencing . Longer duplexes that can serve as Dicer substrates can be more efficient at silencing than standard length siRNAs of identical sequence , which would be expected from the close contact between Dicer and Ago2 in the structure of the RLC . Mismatches, while useful for inducing asymmetry , may not be a practical approach for generating sequences of maximal activity, as mismatched siRNAs are bound by TRBP less strongly than fully-paired sequences . An interesting manipulation of siRNA structure is the use of segmented structures. For instance, small internally segmented interfering RNAs (sisiRNAs) were developed possessing an intact guide strand and two segments of the passenger strand . sisiRNAs, when modified with selected locked nucleic acid (LNA) nucleotides, were found to be more tolerant of chemical modifications than standard siRNAs. Silencing has also been achieved using siRNAs possessing DNA segments on both the guide and passenger strands , though it is important to maintain a primarily A-form duplex to ensure recognition by the dsRNA-binding domains of TRBP and Dicer.
It is important to note that multiple siRNA manufacturers now have libraries of siRNA sequences available for use in targeting all or nearly all of the known expressed genes in common organisms (e.g., human, mouse, and rat). Many of these contain known or proprietary chemical and structural modifications to enhance their activity or reduce off-target effects. While these libraries are valuable resources for those seeking siRNAs as tools for the laboratory, identification and manipulation of siRNAs to achieve maximal activity is still an important task. The available siRNAs may work adequately, but, without clear mechanistic information about how best to select and modify siRNAs, it is not clear if the available sequences are the best for the specific requirements of a given target and application. When this level of control has been achieved, i.e., when an siRNA can be designed taking into account the uniqueness of a particular target and application, the field will have reached the maturity necessary for general consideration as a therapeutic strategy.
Financial support for this work was provided in part by Michigan State University, the National Science Foundation (CBET 0941055), the National Institutes of Health (GM079688, RR024439, GM089866), the Michigan Universities Commercialization Initiative (MUCI), and the Center for Systems Biology.