This study has focused on false negative rates among long dsRNAs used in Drosophila RNAi screens in cultured cells. Although the exact rates will vary depending on the reagent library, assay design, and the level of statistical noise, our analysis provides a detailed example of the issues that need to be considered carefully in the data analysis of an RNAi screen. Importantly, other RNAi reagents, such as siRNAs, shRNAs, and siRNA pools used in mammalian RNAi screens, have their own false positive and false negative rates and these are not necessarily the same as what we observed with Drosophila long dsRNAs. Regardless of the reagent used, however, any false negative rate significantly above zero will cause genes to be missed in an RNAi screen. Likewise, as shown in the K/P screen, even a very low false positive rate among the set of reagents can yield a very high proportion of false positives when expressed as a percentage of the hits obtained in an individual screen. Finally, our study illustrates how transcriptome data from the cell lines can be included as part of the data analysis to eliminate false positives.
The existence of false negatives due to ineffective RNAi reagents necessitates strategies for reducing their effects on the outcomes of RNAi screens. One obvious approach to minimize false negatives in screens is to use multiple, independently screened reagents per gene, as done in some recent RNAi screens [29
]. In principle, use of multiple reagents per gene should reduce the number of false negatives, as a single ineffective RNAi reagent would be compensated by those that are effective. An obvious caveat to this, however, is that simply by including more reagents, the number of false positive results will also increase.
To explore how multiple RNAi reagents per gene could affect the outcome of a screen and to determine the best strategy for disambiguating results when different reagents yield inconsistent results, we devised a simple model of one, two, and three reagents per gene (Table ). Furthermore, we examined three simple generalized disambiguation approaches and modeled how these approaches would affect the outcome of a screen. These disambiguation approaches are as follows: a lenient approach wherein a gene is considered a hit if any RNAi reagent directed against that gene scores above some threshold (Table , Rule A); a stringent approach that requires all reagents directed against the same gene to score (Table , Rule B); and an intermediate approach that requires more than half of the reagents directed against the same gene to score (Table , Rule C). For the purpose of this model, an RNAi "mini-pool" of reagents, such as is sometimes used for mammalian siRNA knockdown, or combinatorial knockdown with multiple dsRNAs, counts as a single RNAi reagent unless the individual components are tested separately.
Model of RNAi reagent disambiguation methods under one, two or three reagents per gene.
To illustrate the model, we chose as an example three hypothetical Drosophila genome-wide dsRNA libraries with false negative and false positive rates of 10% and 1% respectively (Figure ). The model shows that the strategy used to disambiguate results from multiple reagents is critical when interpreting results from a library with more than one independently tested reagent per gene. In a hypothetical library with three reagents per gene, a lenient interpretation (requiring one or more of three reagents to score) results in few false negatives but an extremely high number of false positives in the outcome of a screen (Table and Figure , Rule A). In this scenario, the presence of multiple reagents per gene virtually eliminates false negatives but at the cost of a high number of false positives as illustrated by our K/P JAK/STAT screen which would have a 62% final false positive rate (in terms of the percentage of hits) if interpreted this way. A stringent disambiguation (requiring all three reagents to score) results in few false positives but a high number of false negatives (Tables and Figure , Rule B).
Figure 6 Number of False Negatives and False Positives under hypothetical screening scenarios. We assume a false positive rate of 1% and a false negative rate of 10%, a scenario of 100 "true hits" in the library, and a library targeting 13,735 protein-encoding (more ...)
A third possible strategy for libraries with three reagents per gene (Tables 3 and Figure , Rule C) requires two out of three RNAi reagents to score. This disambiguation method achieved a balance of false negatives and false positives, resulting in low numbers of each relative to what would be achieved by screening a single dsRNA per gene. Thus, adding additional reagents per gene can greatly reduce false negative rates in screens but can also greatly increase the number of false positives in the absence of careful disambiguation.
cell-based RNAi screens, a library with three dsRNAs per gene, wherein discrepancies are disambiguated by requiring two of three dsRNAs to score, achieves a good balance between false negatives and false positives. For RNAi reagents with significantly different reagent-level false positive and false negatives rates, a different number of reagents with a different disambiguation strategy may be more appropriate. Indeed, several groups have proposed using four or more siRNAs per gene in mammalian siRNA screens [4
]. Moreover, our model and disambiguation strategy is based on a simple binary interpretation of hits, but other more quantitative approaches have been proposed that do not require a screener to designate individual reagents as hits or non-hits. A recently described approach for disambiguating image-based RNAi screens, quantitative multiparametric image analysis (QMPIA), can be applied to complex screens with a very large number of read-outs [29
]. A more broadly applicable quantitative disambiguation approach, the redundant siRNA activity (RSA) method [31
], requires only one read-out per RNAi experiment. Regardless of the disambiguation approach used, screeners must carefully interpret results obtained with multiple reagents per gene in order to reduce false negative results without increasing the number of false positive results to an unacceptably high level.