Home | About | Journals | Submit | Contact Us | Français |

**|**Nucleic Acids Res**|**v.36(14); 2008 August**|**PMC2504311

Formats

Article sections

Authors

Related links

Nucleic Acids Res. 2008 August; 36(14): 4667–4679.

Published online 2008 July 15. doi: 10.1093/nar/gkn435

PMCID: PMC2504311

Xiaohua Douglas Zhang,^{1,}^{*} Pei Fen Kuan,^{2} Marc Ferrer,^{3} Xiaohua Shu,^{4} Yingxue C. Liu,^{1} Adam T. Gates,^{5} Priya Kunapuli,^{3} Erica M. Stec,^{3} Min Xu,^{5} Shane D. Marine,^{3} Daniel J. Holder,^{1} Berta Strulovici,^{3} Joseph F. Heyse,^{6} and Amy S. Espeseth^{7}

*To whom correspondence should be addressed. Phone: +1 215 652 0522, Fax: +1 215 993 1835, ; Email: Xiaohua_zhang/at/merck.com

Received 2008 May 19; Revised 2008 June 18; Accepted 2008 June 23.

Copyright © 2008 The Author(s)

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article has been cited by other articles in PMC.

RNA interference (RNAi) is a modality in which small double-stranded RNA molecules (siRNAs) designed to lead to the degradation of specific mRNAs are introduced into cells or organisms. siRNA libraries have been developed in which siRNAs targeting virtually every gene in the human genome are designed, synthesized and are presented for introduction into cells by transfection in a microtiter plate array. These siRNAs can then be transfected into cells using high-throughput screening (HTS) methodologies. The goal of RNAi HTS is to identify a set of siRNAs that inhibit or activate defined cellular phenotypes. The commonly used analysis methods including median ± *k*MAD have issues about error rates in multiple hypothesis testing and plate-wise versus experiment-wise analysis. We propose a methodology based on a Bayesian framework to address these issues. Our approach allows for sharing of information across plates in a plate-wise analysis, which obviates the need for choosing either a plate-wise or experimental-wise analysis. The proposed approach incorporates information from reliable controls to achieve a higher power and a balance between the contribution from the samples and control wells. Our approach provides false discovery rate (FDR) control to address multiple testing issues and it is robust to outliers.

RNA interference (RNAi) is a naturally occurring pathway for the regulation of gene expression in which small double-stranded RNAs (siRNAs) lead to the destruction of mRNAs with complementary nucleoside sequences (1). This pathway can be co-opted by experimentally introducing synthetic 19–21-mer double stranded RNAs designed to target specific mRNAs, thus knocking down the expression of the protein of interest (2). RNAi offers an effective way of silencing a gene and has been seen as the third class of drugs, after small molecules and proteins (3). The importance of RNAi was further recognized when the Nobel Prize in Medicine and Physiology was awarded to Drs Fire and Mello in 2006 for their research in this field. RNAi can be utilized on a genome-wide scale via high-throughput screening (HTS) technology that allows thousands of siRNAs to be tested in an unbiased manner to identify previously unknown genes involved in a biological pathway. The goal of RNAi HTS is to identify a set of siRNAs that inhibits or activates the cellular phenotype being evaluated. This process is called hit selection. One of the most fundamental challenges in HTS is to glean biological significance from mounds of data, which relies on the development and adoption of appropriate statistical designs and analytic methods for quality control and hit selection (4).

For hit selection, there are two major types of approaches: one is the use of analytic metrics to assess and rank the size of siRNA effects and the other is the use of hypothesis testing to control false positive and false negative rates (5). In the first approach, fold change, mean difference, percent activity, percent viability, percent inhibition and strictly standardized mean difference (SSMD) have already been proposed and explored in the literature (5–10). In the second approach, the most commonly used methods are the use of *z*-score or *t*-test for testing the null hypothesis that no difference exists between the means, i.e. mean ± *k*SD or median ± *k*MAD (median absolute deviation) (11–17). These methods usually control the false positive and false negative rates based on a single test. Given that a large number of siRNAs are tested in an HTS assay, the false positive rate will be inflated. Hence, one issue in these methods is the adjustment of error rates in multiple hypothesis testing.

Another issue arising from these classical approaches is whether to perform plate-wise or experiment-wise analysis. The plate-wise analysis can adjust for different systematic errors within each plate. Previously, we demonstrated that a plate-wise analysis has a higher probability of detecting true, modestly effective, hits than an experiment-wise analysis when there are not cluster(s) of true hits in any of the plates (17). However, plate-wise analysis may produce misleading results if a cluster of active siRNAs is located within a single plate. An experiment-wise analysis is not affected by the distribution of active siRNAs between plates; however, it cannot adjust for systematic errors within each plate. Current strategies require a decision on whether to use plate-wise or experiment-wise analysis. Finally, all the above methods of hit selection utilize information from only a negative reference. It remains unresolved whether a negative control or the majority of sample wells should be used as the negative reference to capture information on the variability (18).

In this article, we propose a methodology based on a Bayesian framework for hit selection to address the issues encountered in the classical approaches described above. Newton *et al*. (19) developed a Bayesian method to control false discovery rate (FDR) for analyzing microarray data. Like Newton *et al*.'s method, our proposed Bayesian method controls FDR via a direct posterior probability approach. In addition, to adapt the Bayesian FDR methods to solve issues of hit selection in genome-scale RNAi screens, our method has the following unique features. First, the proposed method incorporates information from reliable controls; thus it maintains a balance between the contributions from the sample and control wells. For screens employing effective positive and negative controls, an approach that utilizes this information will be more powerful in hit selection. Second, the proposed method implements robust statistical processes; thus it is robust to outliers. Third, the priors of the proposed method incorporate experiment-wise information whereas the likelihoods are constructed in a plate-wise base. Hence, this approach allows for sharing of information across plates in a plate-wise analysis and thus, eliminates the need to decide between a plate-wise or experiment-wise analysis. We illustrate the performance of our approach via both the case studies and simulations.

An RNAi HTS experiment is usually conducted in 96-well, 384-well or 1536-well plates in which the cells in a well are treated with a unique siRNA or a control. A typical RNAi HTS project starts with a primary screen of about 20 000 to 50 000 siRNAs most of which have no replicate. The effective siRNAs identified (called ‘hits’) in the primary screen are further investigated using one or more confirmatory screens in which each siRNA has replicates. Currently, a typical primary screen may have 50–200 384-well plates and a typical confirmatory screen may have 3–20 384-well plates. The measured response is usually the intensity emitted by labeled particles such as fluorescent dyes. For simplicity, in this article, the term ‘intensity of an siRNA’ is used to refer to ‘the measured intensity of a phenotype corresponding to the treatment of an siRNA’, which may be light intensity, the intensity emitted by a dye or the ratio of intensity emitted by two dyes depending on individual experiments. Considering mathematical and computational convenience for modeling the observations with normal distribution, we typically log transform the raw intensity for statistical analysis. All experiments include a positive control with specific knockdown effects (usually occupying 4–16 wells) and a negative control with nonspecific knockdown effects (usually occupying 4–20 wells) in each plate.

Let *X*_{1}, *X*_{2}, …, *X _{K}* be the log-transformed intensities of sample siRNAs in a plate. Since each sample well consists of unique siRNAs targeting different genes, we assume that each well has its own distribution. Assume that the likelihood function is given by

1

where μ* _{i}* is the unknown mean for sample siRNA

Let the prior distribution for μ* _{i}* be π(μ

2

where θ_{0} and τ^{2} are the center and variance of a negative reference, i.e. an siRNA with no specific effects. θ_{0} is preestimated using the mean of sample siRNA wells excluding outliers in a plate. This model (i.e. Model 1) is based on the assumption that most of the sample siRNAs do not have an activation/inhibition effect. We could also use a uniform prior if no prior information is available for μ* _{i}*. However, this will be equivalent to the classical z-score method or mean ±

In cases where both effective positive (activation and inhibition) and negative controls are available and reliable, we propose using a prior that incorporates the information from these controls for selecting activating or inhibiting siRNAs, which leads to Model 2. Let the prior be a mixture distribution

3

where *p*_{0} is the prior proportion of nonhits, *p*_{1} is the prior proportion of activation hits, *p*_{2} is the prior proportion of inhibition hits and *p*_{0} + *p*_{1} + *p*_{2} = 1. The densities π_{0}, π_{1} and π_{2} describe the fluctuations of μ* _{i}* within each hypothesis. We assume that and . Other choices of π

In a typical RNA HTS experiment, the negative control wells are from an siRNA with unspecific effects, e.g. luciferase in our case study for HIV. The observed intensities from the negative control wells in a plate can be assumed to be independent and identically distributed as where μ* _{N}* is the mean of the negative control. In many cases, the variability of the negative controls can be assumed to be the same across plates within each experiment. Therefore, we can pool the variability estimates across the plates to allow information sharing. This is also motivated by the fact that the number of negative controls in each plate is usually small; thus if we pool the information across the plates we can obtain a better estimate. To reduce the impact of outliers, we calculate the deviation of each well for a negative control from the median of the negative control in each plate and exclude outliers identified experiment wise using boxplot parameters. That is, let

The unconditional variance of *X _{i}* is where

For our Model 1, using the likelihood given in Equation (1) and prior distribution Equation (2), the posterior distribution of μ* _{i}* is given by

4

Our quantity of interest is μ* _{i}* in this model. From posterior distribution Equation (4), the Bayesian framework here allows for the inference on each sample siRNA incorporating both its own intensity

For our Model 2, using the likelihood function Equation (1) and prior distribution Equation (3), the posterior distribution of μ* _{i}* is given by

5

where

and m=0,1,2

's are estimated using the EM algorithm similar to Newton *et al*. (19), and is the density function of standard normal distribution. From posterior distribution Equation (5), the Bayesian framework here allows for the inference on each sample siRNA incorporating both its own intensity *X _{i}* and the mixture center of the controls , respectively, weighted by the variability of the sample siRNAs distribution and of the negative reference.

We are interested in the hypotheses *H*_{0}: siRNA *i* {no effect}, *H*_{1}: siRNA *i* {activation effect} and *H*_{2}: siRNA *i* {inhibition effect}. An HTS experiment typically consists of approximately 30 000 siRNAs of interest. In other words, we are conducting ~30 000 hypothesis tests simultaneously. Given the large number of hypothesis tests, an effective way of controlling the error rate is by using the notion of FDR (19–21). We use the direct posterior probability approach to control FDR (19). One key in this approach is the calculation of the probability that an siRNA falls under each hypothesis given the data, namely *P*(*H _{m}* |

Suppose large (small) values of μ* _{i}* relative to θ

Given *P*(*H _{m}*|

6

We rank the siRNAs according to increasing values of β* _{i}* and declare the first

7

Therefore, the expected FDR, given the data, is *C*(κ)/*J*. Similarly, we can obtain the FDR for selecting inhibition hits only by replacing β* _{i}* = 1−

With the control of FDR calculated using the above method, we may have two major strategies to select hits in RNAi HTS screens. One is to set up a prespecific FDR and search a list of *J* siRNAs with the smallest posterior probabilities under the null hypothesis, such that the expected FDR (namely *C*(κ)/*J*) given the data equals to the prespecific FDR value. The other is to fix the number (*M*) of hits to be selected for further analysis, for which we can rank the siRNAs according to increasing posterior probabilities under the null hypothesis and select the first *M* siRNAs as hits. The corresponding FDR is then the sum of their posterior probabilities under the null hypothesis divided by *M*. In addition, if we are interested in classifying each siRNA as either *H*_{0} (no effect), *H*_{1} (activation) or *H*_{2} (inhibition), this can be achieved by declaring sample siRNA *i* to be *H*_{k} if .

We conducted two simulation studies to investigate the performance of our Bayesian approach. The first simulation explores the situation where each plate has about the same number of hits whereas the second simulation explores the situation in which some of the plates have an enriched number of hits in each experiment. In both studies, we simulated data for 100 384-well plates in one run, each plate having a negative control, an activation control, an inhibition control, true activation hits, true inhibition hits and nonhits. Specifically, in a plate there were 12 draws from normal distribution *N*(θ_{0}, σ^{2}) for the negative control, 10 draws from *N*(θ_{0} + 4σ, σ^{2}) for an activation control, 10 draws from *N*(θ_{0} − 4σ, σ^{2}) for an inhibition control, *n*_{1} draws from *N*(θ_{0} +*h*σ, σ^{2}) for activation hits, *n*_{2} draws from *N*(θ_{0} − *h*σ, σ^{2}) for inhibition hits and 384−32−*n*_{1}−*n*_{2} draws from *N*(θ_{0}, τ^{2}) for nonhits. θ_{0}, τ and σ were sampled from three continuous uniform distributions U(4, 5), U(0.5, 0.6) and U(0.35, 0.45), respectively. The three continuous uniform distributions come from actual RNAi HTS experiments.

For the first simulation study, the number of true hits, *n*_{1} and *n*_{2}, were both drawn from the discrete uniform distribution U[0, 10] for each run of the *100* plates. For the second simulation study, in each run, we simulated *E* enriched plates and *100-E* regular plates where *E* was a random draw from 1 to 9. *n*_{1} and *n*_{2} were both drawn from the same discrete uniform distribution, U[50, 70], in an enriched plate and from the discrete uniform distribution, U[0, 5], in a regular plate. In both studies, the data were simulated from various scenarios for the strength of true hits. The strength of a hit is indexed by *h*, the ratio of the deviation between the mean of a hit and θ_{0} to the standard deviation of a control, i.e. *h*=(mean of a hit − θ_{0})/σ. Each scenario was repeated 200 times.

In the Methods Section, we proposed two Bayesian models for hit selection in RNAi HTS assays. In Model 1, we construct the prior using a negative reference, whereas in Model 2, we construct the prior based on a negative reference, an activation control and an inhibition control. Here we apply both Bayesian models to an HIV siRNA primary screen and a hepatitis C virus (HCV) siRNA primary screen.

A HTS was conducted to identify cellular factors associated with HIV replication. In this screen, HeLa P4/R5 cells, which produce β-galactosidase upon infection with HIV (22), were transfected with siRNAs targeting 18 601 genes. Following transfection, the cells were infected with HXB2 HIV. To allow for the identification of host factors associated with early events, the level of HIV infection (as determined by β-galactosidase activity) was assessed at 48 h postinfection. The experiment consisted of 96 384-well plates. There were 304 sample siRNA wells, 8 negative control wells (Luciferase), 4 moderate inhibition control wells (CCNT), 16 strong inhibition control wells (cell no virus) and 4 activation control wells (TSG-101).

With either model, we have two strategies to select hits with FDR control: one is to fix the number of selected siRNAs and the other is to control a prespecific FDR. For the first strategy, we can obtain the FDR level corresponding to various preset numbers of selected siRNAs. For the second strategy, we can obtain the number of selected hits corresponding to various prespecific FDR levels. In either strategy, the FDR levels can be obtained for the selected siRNAs, which effectively addresses the multiplicity issue of hit selection in RNAi HTS assays. In contrast, in mean ± *k*SD and median ± *k*MAD, *k* is commonly chosen to be 2 or 3, which is based on a single test. Table 1 displays the results for both models under various preset number of selected hits or prespecific FDR levels.

Based on Table 1, using the ranking strategy in Model 1, if we select 100 siRNAs as hits, the FDR is only 0.039; on the other hand, if we select 1000 siRNAs as hits, the FDR is 0.228. Using the strategy of directly controlling FDR, if the prespecified FDR is 0.01, only 31 siRNAs are selected as hits; on the other hand, if the prespecified FDR is 0.30, 1266 siRNAs are selected as hits. One of the goals in a primary screen is usually to select a set of 500 to 1400 siRNAs with the lowest FDR for further investigation. With this consideration, it is reasonable to select hits by controlling the FDR to be somewhere between 0.25 and 0.35 depending on the capacity available for confirmation screening or other further investigations.

Model 1 does not use any information about the strength of the positive controls. Consequently, the hit selection results using Model 1 are not affected by the effectiveness of any positive controls in the experiment (Figure 1A and B). On the other hand, Model 2 does incorporate the strength of positive controls. Thus, hit selection results using Model 2 are highly affected by the positive controls adopted in the experiment. In the HIV RNAi screen, Model 2 identifies many more inhibition hits (i.e. 439 inhibition hits) using the moderate inhibition control compared to those (i.e. 27 inhibition hits) identified using the strong inhibition control (Figure 1C and E) when FDR = 0.20. This is because more sample siRNAs are similar to the moderate control than to the strong control.

Hits identified by Bayesian methods with FDR control. In each panel, green, red, light pink, grey, yellow, blue and light blue points represent a negative control, moderate inhibition control, strong inhibition control, fairly strong activation control, **...**

Using our Bayesian models, we get an estimated probability of each siRNA being in each of the three hypotheses: *H*_{0} (no effect), *H*_{1} (activation) and *H*_{2} (inhibition). According to the maximal posterior probability among these hypotheses, we can classify each sample siRNA into one of the three categories: no effect, activation and inhibition. In this classifying strategy, we can also obtain the corresponding FDR by summing up the posterior probabilities of all selected hits under the null hypothesis divided by the number of hits. The results are listed in the bottom rows of Table 1. Using the classifying strategy based on Model 1, we identified 816 siRNAs as activation hits and 536 siRNAs as inhibition hits with a corresponding FDR of 0.312 (Panel A of Table 1 and Figure 1B). As described earlier, classification results based on Model 2 are strongly affected by the strength of the positive controls. For example, there are 2439 siRNAs classified as inhibition hits with a FDR of 0.366 if the moderate inhibition control is used, whereas there are only 41 siRNAs classified as inhibition hits with a FDR of 0.282 if the strong inhibition control is used (Panel B of Table 1 and Figure 1D and F). This result hints that many siRNAs have moderate to weak inhibition effects whereas only a few siRNAs are strongly inhibitory.

When using the commonly used methods such as mean ± *k*SD and median ± *k*MAD, we also need to decide whether we should apply the methods experiment- or plate-wise. In the HIV screen, the between-plate variability is fairly large. Clearly, the data in plates 1–21 shift down while the data in plate 79 shift up compared with the data in the remaining plates. When applying median ± 3MAD experiment wise, the inhibition hits are mostly selected from Plates 1 to 21 and the activation hits are mostly selected from Plate 79 and plates 22 to 44 (Figure 2C and D). Clearly these results are dominated by the systematic errors and are unreliable. The plate-wise hit selection results are more reasonable (Figure 2A and B). That is one of the major reasons that median ± 3MAD is applied plate wise more often than experiment wise. On the other hand, an experiment-wise analysis is not affected by the distribution of active siRNAs between plates whereas a plate-wise analysis may produce misleading results when a cluster of active siRNAs is located within a single plate. Our Bayesian methods allow for sharing of information (experiment-wise) across plates in a plate-wise analysis and thus, avoid the need for a decision between plate-wise or experiment-wise analysis. The distribution of selected hits across the plates using our Bayesian methods also looks reasonable (Figure 1A–F).

Hit selection results in the HIV screen, identified by median ± 3MAD plate wise (**A** and **B**) or experiment wise (**C** and **D**) and using the majority of sample wells (**A** and **C**) or a negative control (**B** and **D**) as a negative reference. In each panel, green, **...**

In the methods of mean ± *k*SD and median ± *k*MAD, there is a need to decide whether a negative control or the majority of sample wells should be used as a negative reference to calculate the mean (or median) and SD (or MAD). Considering the fact that median ± *k*MAD is more robust to outliers, we concentrate on the comparison of median ± *k*MAD with our Bayesian methods. For median ± 3MAD, if using the negative control as a negative reference, we will select too many hits either plate wise (i.e. 8808 hits shown in Figure 2B) or experiment wise (i.e. 2138 hits shown in Figure 2D). On the other hand, if using the majority of sample wells as the negative reference, we will select much fewer hits (i.e. 106 hits shown in Figure 2A and 29 hits shown in of Figure 2C). Our Bayesian models do not have such an issue in the determination of a reference because, from the posteriors shown in formulas (4) and (5), our Bayesian models naturally combine the information in both the negative control and sample wells. The numbers of selected hits using our Bayesian methods are reasonable (e.g. 685 hits with a FDR of 0.20 shown in Figure 1A and 1352 hits with a FDR of 0.312 shown in Figure 1B). These methods control FDR to adjust for multiplicity issues and provide each siRNA with more than a yes/no answer, but rather an estimate probability of being in each of three groups: no effect, activation and inhibition.

A primary siRNA HTS experiment was conducted to identify host factors associated with HCV replication, using the HCV replicon assay system described in Zuck *et al*. (23). Huh-7 cells containing an HCV genotype 1b replicon with a β-lactamase reporter were transfected with siRNA pools targeting ~19 000 genes. Three days after siRNA transfection, the cells were stained for β-lactamase expression. siRNAs that decreased β-lactamase expression were considered to affect HCV replication. The HTS was conducted using siRNAs in 97 plates, each with 384 wells (17). In this experiment, there were sample siRNAs (no replicates in the whole experiment), a strong positive control that targeted HCV (16 replicates per plate), a weaker positive control that targeted the HCV host factor, VAPA (8 replicates per plate) and a negative control (16 replicates per plate).

The results of hit selection using median ± *k*MAD experiment wise are dominated by the systematic shift in the first nine plates (Figure 3B). Plate-wise median ± *k*MAD tends to obtain more reasonable hits in this screen (Figure 3A). Using Model 1, we not only obtain more reasonable results (Figure 3C and D) but also know FDR corresponding to any group of selected hits (Table 2). Based on Model 1, more inhibition hits were identified than activation hits given a prespecific FDR in the screen. For example, given a prespecific FDR of 0.20, we identified 290 inhibition hits and 66 activation hits (Panel A of Table 2 and Figure 3C). Using the classifying strategy of Model 1, we identify 591 inhibition hits and 150 activation hits, resulting in a FDR of 0.32 (Panel A of Table 2 and Figure 3D). To obtain a reasonable number of hits for further investigation while considering the control of FDR, we may fix the FDR to be somewhere between 0.25 and 0.40 for hit selection in this screen.

Hit selection results in the HCV screen, identified by median ± 3MAD plate wise (**A**) and experiment wise (**B**) using the majority of sample wells as a negative reference as well as by Bayesian Model 1 (**C** and **D**) and Bayesian Model 2 with the weaker **...**

There was no control for identifying siRNAs that enhanced HCV replication in the screen. Hence, strictly speaking, the mixture model cannot be applied to this screen. However, one control appeared to act as an activation control (gray points in Figure 3). Therefore, we applied this control as an activation control and apply Model 2 to identify hits in the inhibition direction only. The numbers of selected hits corresponding to various prespecified FDRs are shown in Panel B of Table 2. Note, this siRNA was not designed to be an activation control and is not stable, even resulting in a median signal less than that of the negative reference in a few plates (Figure 3). If one uses Model 2 with this control to identify hits with activation effects, one would obtain misleading results: some selected activation hits have values less than the median of the negative control in some plates (Figure 3E and F). Hit selection using the classifying strategy of Model 2 is also strongly affected by the quality of positive controls, and can thus generate similar misleading results (Panel B of Table 2 and Figure 3F).

Receiver operating characteristic curve (ROC) analysis is commonly used to compare the performance of different methods in experiments in which both positive and negative controls are available and reliable. Here, we use ROC analysis to compare the performance of our proposed Bayesian approaches with plate-wise and experiment-wise median ± *k*MAD methods in both the HIV and HCV screens. The performance of each method is assessed via sensitivity (true positive rate) and specificity (true negative rate). We used the negative control to define the true negative set in either screen. In the HIV screen, we used all three positive controls (i.e. activation, moderate inhibition and strong inhibition controls) to define the true positive set. In the HCV screen, we used the two inhibition controls to define the true positive set. The ROCs of five methods (i.e. plate-wise median ± *k*MAD, experiment-wise median ± *k*MAD, Bayesian Model 1, Model 2 using the weaker inhibition control and Model 2 using the stronger inhibition control) in the HIV and HCV screens are given in Figure 4A and B, respectively.

ROC curves for detecting all three positive controls in the HIV siRNA screen (**A**) and for detecting two inhibition controls in the HCV siRNA screen (**B**). The negative control in each screen is used to calculate the specificity in the screen.

As evident in the ROC curves for the HIV screen in Figure 4A, Bayesian Model 1 has the highest power to detect the three positive controls given any false positive rate in the range of interest (i.e. between 0 and 0.05); Bayesian Model 2 has powers higher than experiment-wise median ± *k*MAD but lower than plate-wise median ± *k*MAD and plate-wise median ± *k*MAD performs slightly worse than Bayesian Model 1. As evident in the ROC curves for the HCV screen in Figure 4B, Bayesian Model 1 performed better than plate-wise median ± *k*MAD. Bayesian Model 2 using the weaker inhibition control performs better than Model 1 when the false positive rate is in the range of 0 and 0.01 and worse than Model 1 when the false positive rate is in the range of 0.01 and 0.05. In both screens, experiment-wise median ± *k*MAD always has a lower power than plate-wise median ± *k*MAD given a false positive rate between 0 and 0.1. Bayesian Model 2 with the strong inhibition control did not perform well in either screen.

The above results describing the performance of different methods on hit selection in siRNA HTSs are based on only two experiments. Thus, we conducted simulation studies as described in the Method Section to investigate the performance of our Bayesian approach in general. An exemplary run of simulated data in the first simulation study is shown in Figure 5A1.

Simulated data and results. **A1** shows a simple realization of 100 plates in 1 run of the first simulation with *h* = 1 where . In this panel, green, red, gray and yellow points represent a negative control, inhibition control, activation control and nonhits, **...**

We compared the performance of the Bayesian approach with the commonly used median ± *k*MAD method via area under the receiver operating characteristic curve (AUROC). For each method, the sensitivity and specificity were computed at various target errors α. *k* was chosen as Φ^{−1}(1−α/2) where Φ is the cumulative distribution function of the standard normal distribution. The sample wells with values greater than median+*k*MAD or less than median−*k*MAD were declared to be hits. For the Bayesian approach, the target error α was the FDR. The negative control was used to calculate specificity and both activating and inhibiting true hits were used to calculate sensitivity. ROC curves in one run of 100 plates are shown in Figure 5A2. For each run, the AUROC was then computed as the area under a ROC curve. In real RNAi HTS experiments, the value of the approach is determined by its sensitivity in situations where the false positive rate is below 0.05 experiment wise, which is equivalent to a specificity of above 0.95. Therefore, we calculated the AUROC in the situation where the specificity is between 0.95 and 1. When the specificity is above 0.95, a null method that randomly declares hits has a value of 0.00125 for AUROC and a perfect method has a value of 0.05 for AUROC. Figure 5B1–5 show the AUROC results in the first study and C1–5 show the AUROC results in the second study (Figure 5). Each panel summarizes the AUROC for 200 runs from different strengths of true hits.

From B1–5 of Figure 5 for the first study, Bayesian Model 1 overall has the largest AUROC in all scenarios, especially when the true hits have weak to moderate effects. Employing plate-wise median ± *k*MAD tends to result in an AUROC slightly less than Bayesian Model 1. Using experiment-wise median ± *k*MAD resulted in the smallest AUROC of all three scenarios tested. Bayesian Model 2 resulted in an AUROC between Bayesian Model 1 and experiment-wise median ± *k*MAD. Therefore, the first simulation study suggests that, in an experiment without any enriched plates, Bayesian Model 1 is the most effective method, especially with regard to detecting hits with weak or moderate effects; followed by the use of the plate-wise median ± *k*MAD method, Bayesian Model 2 and finally by the use of the experiment-wise median ± *k*MAD method.

We proposed a methodology for hit selection in genome-scale RNAi HTS based on a Bayesian framework. The proposed methodology aims to address issues arising from classical approaches including median ± *k*MAD. One issue is the adjustment of error rate in multiple hypothesis testing. In this article, we construct Bayesian models to control FDR via the direct posterior probability approach (19). The second issue is the decision of whether a plate-wise or experiment-wise analysis should be used. The results in both HIV and HCV siRNA screens show that selected hits using an experiment-wise analysis can be dominated by systematic errors in a plate. A plate-wise analysis is more robust in the presence of systematic errors, but can result in false negatives in plates containing a cluster of true hits. Our proposed Bayesian methodology allows data to be shared across the plates of an experiment in a plate-wise analysis, thus obviating the need for choosing either plate-wise or experiment-wise analysis. The third issue is the use of controls: Should the negative control or the majority of sample wells be used as the negative reference when calculating the center and variability of the raw data? How should the information from positive and negative controls with varying degrees of efficacy be incorporated into the hit selection strategy? The choice of prior in our Bayesian methods enables us to incorporate information from various controls. From the posteriors in the proposed models, our Bayesian approach maintains a balance between the contribution from the sample wells and control wells. In addition, outliers are identified and excluded in the estimation of parameters in our Bayesian methods; thus our methods are robust to outliers. Finally, the Bayesian approach provides each siRNA with more than a yes/no answer, but rather an estimate probability of being in each of three groups: activation, inhibition and no effect.

We applied the Bayesian methods to two real genome-scale RNAi screens with different features. The HCV screen did not have a true activation control whereas the HIV screen did. The inhibition controls had data variability larger than the negative control in the HIV screen whereas the inhibition controls had data variability smaller than the negative control in the HCV screen. The strength of positive controls in the two screens also differed from each other. By using the Bayesian methods for hit selection we describe here, we were able to address effectively the issues of hit selection common to both screens and to obtain a reasonable pool of selected hits for both screens despite the differences between the two.

We focus on two Bayesian models for hit selection in RNAi HTS assays. In Model 1, we construct the prior using a negative reference, whereas in Model 2, we construct the prior based on a negative reference, an activation control and an inhibition controls. We apply both Bayesian models for hit selection in an HIV siRNA primary screen and an HCV siRNA primary screen. The applications show that the hit selection results using our proposed methods, especially Model 1, are more reasonable than those using classical methods (Figures 2–4). ROC analysis in the case studies shows that Model 1 is more powerful than the commonly used classical approaches. When the positive controls are moderately effective and of high quality, Model 2 is also powerful (Figure 4B). Simulation studies show that, in an experiment without any enriched plates, in general, Bayesian Model 1 has the best performance especially in detecting hits with weak or moderate effects; while Bayesian Model 2 performs better than the experiment-wise median ± *k*MAD method and has a good performance in detecting hits with effects equal to or stronger than the positive controls (Figure 5B1–5). In an experiment with 1 to 9 enriched plates, in general, Bayesian Model 1 has the best performance in detecting hits with weak or moderate effects, while Bayesian Model 2 performs equivalently to classical experiment-wise methods in detecting hits with strong effects (Figure 5C1–5).

Bayesian Models 1 and 2 allow for three strategies to select hits: (i) fix the number of selected siRNAs, (ii) control a prespecific FDR or (iii) classify an siRNA based on its maximal posterior probability. In a typical primary siRNA HTS screen, one of the goals is usually to select a practical number (usually some number between 500 and 1500) of hits for further investigation. Therefore, there is a motivation to use the first strategy for hit selection in a primary screen. Our Bayesian methods allow selecting a fixed number of siRNAs with the lowest FDR. On the other hand, we cannot simply select a fixed number (say 1000) from every screen and ignore the error rate because different screens may have very different error rates. Thus, the second strategy may be more plausible especially when we want to control some fixed error rate in most screens. Considering both HIV and HCV screens discussed in this article, a reasonable choice of a fixed FDR is somewhere between 0.25 and 0.35, which allows us not only to fix a reasonable number of siRNAs for further analysis but also to control the FDR within reasonable range. The classifying strategy under Model 1 also works effectively in the two screens, which yields an FDR of around 0.30 and a reasonable number of selected hits (i.e. 1352 hits in the HIV screen and 741 hits in the HCV screen).

Model 1 does not use any information about the strength of positive controls. Consequently, the hit selection results using Model 1 are robust to the quality and strength of any positive controls used in the experiment. On the other hand, Model 2 incorporates the strength of positive controls. Thus, hit selection results using Model 2 are highly sensitive to the data quality and strength of positive controls adopted in the experiment. It is not uncommon that the strength of positive controls varies within or between experiments. It is also not uncommon that the data quality of positive controls is poor in some experiments. These facts make Model 1 more favorable than Model 2 for hit selection in RNAi HTS screens. On the other hand, the simulation studies show that Model 2 has a better performance in detecting hits with effects equal to or stronger than the positive controls when there are plates with enriched number of hits (Figure 5C1–5). If the data quality of positive controls is good and the positive controls are moderately or fairly strongly effective, Model 2 may be more powerful (Figure 4B). Thus, Model 2 may help us to identify siRNAs with a desired potency if there are high quality and reliable controls with a similar effectiveness in the screen.

In our Bayesian methods, we preestimate some parameters in the priors, resulting in a fast and efficient algorithm. There are two reasons for the preestimation. One is to reduce computational time because it is critical to keep computational time as low as possible when analyzing large data sets from high-throughput screens. The second reason is to focus on selecting hits only in plates with good data quality, because hit selection results are often misleading in plates with poor quality (24–29). In the Bayesian models described in this article, we use the fact that the unconditional variance of an observed value should be larger than the expectation of its conditional variance (i.e. ) in a plate with good quality. In a plate with poor data quality, the pooled variance from different controls (i.e. estimate of σ^{2}) could be greater than the variability of sample wells (i.e. estimate of *Var* (*X _{i}*)) in a plate. In such a case, the estimate of τ

One strategy to handle the situation with is the use of quality control methods described in (24–29) to assure all the plates have good quality during the experimental stage of a screen, thus preventing the occurrence of . Another strategy is to select hits in the plates with using non-Bayesian methods such as those described in (5–7,10,13,17) and then to put a warning on all the hits from these plates. The third strategy is to build a Bayesian hierarchical model that puts priors on τ^{2} and/or other preestimated parameters. For example, we may construct a hierarchical model as follows:
, and where θ_{0} and σ^{2} are preestimated as in Model 1. This leads to the posteriors:

and

Based on these posteriors, we make inference on μ* _{i}* using direct sampling in the Monte Carlo simulation.

There are two major issues with hierarchical models: one is the long computational time and the other is that plates with good or poor quality are treated in the same way. For example, for the simple hierarchical model described above, we need to run direct sampling on each of 25 000 siRNAs tested. The number of iterations in the Monte Carlo simulation for each siRNA has to be large. Otherwise, due to sampling error, we may obtain incorrect results: i.e. a more effective siRNA may have less chance to be selected as a hit than a less effective siRNA in the same plate. Even if only 2000 iterations are used for each siRNA, this will result in a total of 50 000 000 iterations when analyzing data from an HTS screen. Building more layers in a hierarchical model will result in even longer computational time, especially when there is at least one posterior requiring an indirect sampling in the Monte Carlo simulation (30). Therefore, in genome-wide RNAi primary screens, we do not recommend the use of Bayesian hierarchical models (especially those requiring the use of indirect sampling) although we may use them in confirmatory screens.

In summary, hit selection in genome-scale RNAi research is important for identifying targets for a new class of therapeutics for treating human diseases. The development of novel effective and powerful analytic methods for hit selection is critical to glean useful information from mounds of data. In this article, we propose a methodology based on a Bayesian framework. The case studies show that our methods effectively address multiple issues in classical methods including error rate issues in multiple hypothesis testing, experiment-wise versus plate-wise analysis and incorporation of information from multiple reliable controls. Both simulation and case studies show that the Bayesian Model 1 we describe is more powerful than classical methods in detecting siRNAs with weak and moderate effects. Our Bayesian Model 2 performs better than classical experiment-wise methods when there are no plates with enriched number of hits and performs equivalently to experiment-wise classical methods in detecting hits with strong effects when there are plates with enriched number of hits in an RNAi HTS experiment.

The author would like to thank Drs Keith Soper and Raymond Bain for their support in this research, and also to thank Editor Roberts and the two anonymous referees for their helpful and cogent comments. A package of R codes (to be submitted to Bioconductor for public access) for the methods including Model 1, Model 2, the simple hierarchical model and simulations described in this article is available upon request to the corresponding author. Funding to pay the Open Access publication charges for this article was provided by Merck & Co., Inc.

*Conflict of interest statement*. All authors except P.F.K. and X.S. are employees of Merck & Co., Inc.

1. Fire A, Xu SQ, Montgomery MK, Kostas SA, Driver SE, Mello CC. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature. 1998;391:806–811. [PubMed]

2. Hannon GJ, Zamore PD. Small RNAs, big biology: biochemical studies of RNA interference. In: Hannon GJ, editor. a Guide to Gene Silencing. New York: Cold Spring Harbor Laboratory Press; 2003. pp. 87–108.

3. Bumcrot D, Manoharan M, Koteliansky V, Sah DWY. RNAi therapeutics: a potential new class of pharmaceutical drugs. Nature Chemical Biology. 2006;2:711–719. [PubMed]

4. Eisenstein M. Quality control. Nature. 2006;442:1067–1070. [PubMed]

5. Zhang XD, Ferrer M, Espeseth AS, Marine SD, Stec EM, Crackower MA, Holder DJ, Heyse JF, Strulovici B. The use of strictly standardized mean difference for hit selection in primary RNA interference high-throughput screening experiments. J. Biomol. Screen. 2007;12:497–509. [PubMed]

6. Malo N, Hanley JA, Cerquozzi S, Pelletier J, Nadon R. Statistical practice in high-throughput screening data analysis. Nat. Biotechnol. 2006;24:167–175. [PubMed]

7. Zhang XD. A new method with flexible and balanced control of false negatives and false positives for hit selection in RNA interference high-throughput screening assays. J. Biomol. Screen. 2007;12:645–655. [PubMed]

8. Gou D, Narasaraju T, Chintagari NR, Jin N, Wang PC, Liu L. Gene silencing in alveolar type II cells using cell-specific promoter in vitro and in vivo. Nucleic Acids Res. 2004;32:e134. [PMC free article] [PubMed]

9. Hsieh AC, Bo RH, Manola J, Vazquez F, Bare O, Khvorova A, Scaringe S, Sellers WR. A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants of gene silencing for use in cell-based screens. Nucleic Acids Res. 2004;32:893–901. [PMC free article] [PubMed]

10. Zhang XD. Genome-wide screens for effective siRNAs through assessing the size of siRNA effects. BMC Research Notes. 2008;1:33. [PMC free article] [PubMed]

11. Berns K, Hijmans EM, Mullenders J, Brummelkamp TR, Velds A, Heimerikx M, Kerkhoven RM, Madiredjo M, Nijkamp W, Weigelt B, et al. A large-scale RNAi screen in human cells identifies new components of the p53 pathway. Nature. 2004;428:431–437. [PubMed]

12. Boutros M, Bras LP, Huber W. Analysis of cell-based RNAi screens. Genome Biol. 2006;7:R66. [PMC free article] [PubMed]

13. Chung NJ, Zhang XD, Kreamer A, Locco L, Kuan PF, Bartz S, Linsley PS, Ferrer M, Strulovici B. Median absolute deviation to improve hit selection for genome-scale RNAi screens. J. Biomol. Screen. 2008;13:149–158. [PubMed]

14. Espeseth AS, Huang Q, Gates A, Xu M, Yu Y, Simon AJ, Shi XP, Zhang X, Hodor P, Stone DJ, et al. A genome wide analysis of ubiquitin ligases in APP processing identifies a novel regulator of BACE1 mRNA levels. Mol. Cell. Neurosci. 2006;33:227–235. [PubMed]

15. Haney SA. Increasing the robustness and validity of RNAi screens. Pharmacogenomics. 2007;8:1037–1049. [PubMed]

16. Pelkmans L, Fava E, Grabner H, Hannus M, Habermann B, Krausz E, Zerial M. Genome-wide analysis of human kinases in clathrin- and caveolae/raft-mediated endocytosis. Nature. 2005;436:78–86. [PubMed]

17. Zhang XD, Yang XC, Chung NJ, Gates A, Stec E, Kunapuli P, Holder DJ, Ferrer M, Espeseth AS. Robust statistical methods for hit selection in RNA interference high-throughput screening experiments. Pharmacogenomics. 2006;7:299–309. [PubMed]

18. Gunter B, Brideau C, Pikounis B, Liaw A. Statistical and graphical methods for quality control determination of high-throughput screening data. J. Biomol. Screen. 2003;8:624–633. [PubMed]

19. Newton MA, Noueiry A, Sarkar D, Ahlquist P. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics. 2004;5:155–176. [PubMed]

20. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B-Methodological. 1995;57:289–300.

21. Storey JD. A direct approach to false discovery rates. Journal of the Royal Statistical Society Series B-Statistical Methodology. 2002;64:479–498.

22. Joyce JG, Hurni WM, Bogusky MJ, Garsky VM, Liang XP, Citron MP, Danzeisen RC, Miller MD, Shiver JW, Keller PM. Enhancement of alpha-helicity in the HIV-1 inhibitory peptide DP178 leads to an increased affinity for human monoclonal antibody 2F5 but does not elicit neutralizing responses in vitro - Implications for vaccine design. J. Biol. Chem. 2002;277:45811–45820. [PubMed]

23. Zuck P, Murray EM, Stec E, Grobler JA, Simon AJ, Strulovici B, Inglese J, Flores OA, Ferrer M. A cell-based beta-lactamase reporter gene assay for the identification of inhibitors of hepatitis C virus replication. Anal. Biochem. 2004;334:344–355. [PubMed]

24. Zhang XD. A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. Genomics. 2007;89:552–561. [PubMed]

25. Iversen PW, Eastwood BJ, Sittampalam GS, Cox KL. A comparison of assay performance measures in screening assays: Signal window, Z′ factor, and assay variability ratio. J. Biomol. Screen. 2006;11:247–252. [PubMed]

26. Sui YX, Wu ZJ. Alternative statistical parameter for high-throughput screening assay quality assessment. J. Biomol. Screen. 2007;12:229–234. [PubMed]

27. Zhang JH, Chung TDY, Oldenburg KR. A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J. Biomol. Screen. 1999;4:67–73. [PubMed]

28. Zhang XD. Novel analytic criteria and effective plate designs for quality control in genome-wide RNAi screens. J. Biomol. Screen. 2008;13:363–377. [PubMed]

29. Zhang XD, Espeseth AS, Johnson EN, Chin J, Gates A, Mitnaul LJ, Marine SD, Tian J, Stec EM, Kunapuli P, et al. Integrating experimental and analytic approaches to improve data quality in genome-wide screens. J. Biomol. Screen. 2008;13:378–389. [PubMed]

30. Zhang XD, Roeder K, Wallstrom G, Devlin B. Integration of association statistics over genomic regions using Bayesian adaptive regression splines. Hum. Genomics. 2003;1:20–29. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of **Oxford University Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |