Several groups and companies have described the design and construction of genome-scale shRNA libraries (Table ).
Properties of shRNA libraries discussed in this review
These libraries include the Netherlands Cancer Institute (NKI) libraries [1
], the RNAi consortium (TRC) libraries [16
], the Hannon–Elledge libraries [18
], and the System Biosciences (SystemBio) libraries. They differ in size, coverage, shRNA sequence design, and, most importantly, they use different strategies to generate the miRNA-mimicking sequences for gene silencing. The NKI, TRC, and SystemBio libraries use RNA polymerase III (RNA Pol III) to express simple hairpin RNAs to mimic the pre-miRNAs for RNAi (for a review on miRNA biogenesis, please see [21
]). The Hannon–Elledge libraries, on the other hand, use RNA polymerase II (RNA Pol II) to express hairpin RNAs in the context of a natural miRNA to mimic the pri-miRNA for RNAi. Owing to the differences, each of these libraries has unique characteristics that we discuss below.
The NKI shRNA library
The NKI shRNA libraries [1
] use the RNA Pol III promoter H1 to express shRNAs that consist of a 19-nt double-stranded stem and a 9-nt loop. Once expressed in cells, the pre-miRNA-like shRNAs are processed into functional siRNAs that have 19-bp double-stranded RNA and 2-nt overhangs on each end. The libraries are generated in a mouse stem cell virus (MSCV)-based self-inactivating retroviral vector pRSC which also contains a puromycin selectable marker. The NKI shRNA library is the smallest among all with ~54,000 shRNAs in total. It targets ~8000 human genes with three shRNAs per gene and ~15,000 mouse genes with two shRNAs per gene.
The target-specific 19-nt sequences in the NKI libraries were selected using the following criteria: (i) target mRNA coding sequence; (ii) target all RefSeq transcript variants of the gene; (iii) no sequence overlap among shRNAs targeting the same gene; (iv) minimal sequence similarity to other genes; (v) abide the thermodynamic asymmetry rules governing the incorporation of the correct RNA strand into the RNA-induced silencing complex (RISC); (vi) begin with a G or C after an AA dimer in the 5′-flanking sequence; (vii) end prior to a TT, TG, or GT doublet in the 3′-flanking sequence; (viii) no stretches of four or more T or A; (ix) 30%–70% G + C content; and (x) no EcoRI or XhoI sites. Because the NKI library was designed when knowledge of the miRNA biogenesis was relatively incomplete, these rules did not include all the known constraints for optimal design of shRNAs.
Each 19-nt sequence was synthesized as two complementary 60-nt oligos that contain the 19-nt stem region, a nine-base loop, a termination sequence, and BglII and HindIII sites. For library construction, pairs of complementary oligos were annealed and cloned into HindIII/BglII-digested pRSC in 96-well plate format. The shRNA expression cassette containing the H1 promoter and the hairpin construct is flanked by EocRI and XhoI sites and can be shuttled into different vectors.
The TRC shRNA library
The TRC shRNA libraries [16
] use the RNA Pol III promoter U6 to express shRNAs that contain a 21-nt double-stranded stem and a 6-nt loop. The pre-miRNA-like shRNAs are then processed into functional siRNAs in cells. The libraries were cloned in the lentiviral vector pLKO.1, which contains a puromycin selectable marker. The TRC shRNA library is the biggest among all, consisting of ~300,000 shRNAs targeting ~60,000 human and mouse genes with 5 shRNAs per gene.
The 21-nt sequences in the TRC libraries are selected with the following procedure. For each RefSeq transcript, all 21-nt sequences were generated in silico from 25 bp after the coding sequence start site to 150 bp before the end. Each sequence was scored using a set of bioinformatics criteria for knockdown potential and ease of synthesis. The top 100 sequences were BLASTed against NCBI RefSeq and Unigene databases for specificity such that each shRNA must have >3 mismatches to all other RefSeqs with at least two of the mismatches between positions 3 and 19. The top four 21-mers from the CDS and the best one from the 3′ UTR were selected for library construction.
For library construction, complementary 65-nt oligos were synthesized that contain the 21-nt shRNA stem region, a six-base loop with a XhoI site, a TTTTT termination sequence, and overhangs compatible with EcoRI and AgeI sites. Annealed oligos were ligated into the EcoRI and AgeI digested pLKO.1 vector in 96-well format and sequence verified.
The Hannon–Elledge shRNA libraries
The first generation of the Hannon–Elledge shRNA library used a similar design as the NKI and TRC libraries [19
]. With the advance in the understanding of the miRNA biogenesis, this was replaced by second-generation libraries that adopt completely different design compared with other shRNA libraries [18
]. The second-generation libraries use RNA Pol II or Pol III to express shRNAs in the context of a natural miRNA, miR-30 (125 bases 5′ and 3′ of the pri-miR-30 sequence). This strategy was shown to be up to 12 fold more efficient than pre-miRNA hairpin designs in producing mature siRNA, and was therefore more effective in gene silencing. More importantly, the use of RNA Pol II promoter greatly increases the flexibility of this library. For examples, promoters of different strengths can be used to achieve optimal RNAi in different cell types or tissues; inducible promoters can be used to control the timing and severity of RNAi; and reporter genes can be expressed concomitantly with the shRNA to label cells that received RNAi.
The second-generation Hannon–Elledge libraries contain ~87,000 shRNAs against all human genes and ~76,000 shRNAs against all mouse genes with ~2–3 shRNAs per gene. These libraries exist several different vectors, including the retroviral vector pSM2, the MSCV-based retroviral vector pSMP (also named MSCV-PM), the lentiviral vector pGIPZ, and the inducible lentiviral vector pTRIPZ (discussed below). The 22-nt shRNA sequences were designed by Rosetta Inpharmatics (Kirkland, USA) using proprietary algorithms developed based on empirical testing of thousands of siRNAs. The algorithm also introduced additional positional biases and thermodynamic rules suggested by analysis of siRNA and endogenous miRNA incorporation into the RISC complex.
To synthesize the library, 97-nt shRNAmir templates containing the 22-nt shRNA stem region, a 15-nt miR-30 loop, and flanking 5′ and 3′ miR-30 sequences were synthesized on printed microarrays, cleaved off, and polymerase chain reaction (PCR)-amplified as a pool using universal primers. Pools of 22,000 shRNAmirs were inserted between XhoI and EcoRI sites in the pSM2 vector. The pSM2 shRNA plasmids were then entered into a high-throughput sequencing pipeline and clones with unique and correct shRNA sequences were retained individually in 96-well plates. For each pool, the sequencing progress was monitored dynamically and stopped when accumulations of new clones slowed down. At that point, new arrays were synthesized to produce additional shRNAmir pools. When the number of shRNA clones for a given gene exceeded three, no additional shRNAs for this gene were made. With this iterative synthesis, 87,283 verified human shRNA clones and 76,896 verified mouse shRNA clones have been produced so far. The libraries were re-cloned into different vectors mentioned above. These different vectors allowed the Hannon–Elledge libraries to be used for almost any kind of assays and thus made them the most flexible and versatile shRNA libraries currently available.
The SystemBio shRNA libraries
Similar to the NKI library, the SystemBio shRNA libraries also use the H1 RNA Pol III promoter to express pre-miRNA-like shRNAs. The shRNAs consist of a 27-nt double-stranded stem and a 12-nt loop. They were designed with a proprietary algorithm and cloned into the System Biosciences pSIH-1-H1, pSIF-1-H1, or pGreenPuro vectors. The SystemBio libraries only exist in pooled format: (i) human genome shRNA library: ~200,000 shRNAs targeting ~50,000 human genes; (ii) mouse genome library: ~150,000 shRNAs targeting ~40,000 mouse genes; (iii) human apoptosis library: 6876 shRNAs targeting 597 human apoptosis genes; (iv) human kinase library: 10,453 shRNAs targeting 897 human kinase genes; (v) human phosphatase library: 2719 shRNAs targeting 244 human phosphatase genes. These libraries have not been sequence validated.
Genome wide vs. focused libraries
Often for practical reasons it is desirable to screen only a subset of genes in the genome such as kinases, phosphatases, transcription factors, or epigenetic enzymes. The TRC, Hannon–Elledge, and SystemBio libraries are all available as ‘gene family’ or ‘pathway’ shRNA libraries. These ‘focused’ libraries are less complex in size and therefore allow more thorough and cost-effective screening. However, ‘focused’ libraries are inherently limited in their discovery potentials as only well-annotated genes are included in these libraries. Given a significant fraction of human genes are poorly annotated in terms of their biological functions, the main appeal for using genome-wide libraries is the discovery of novel functions for new genes.