We conducted two different high-content RNAi primary screens targeting the same set of 719 human kinases in a virus infection setting. The first screen aimed to identify host cell factors involved in HCV infection, the second screen focused on DENV. Both screens were carried out on cell arrays [20
]. Results of the HCV screen have previously been published using an analysis based on average intensities per spot [17
], and the screen has consequently been re-analyzed using a clustering approach on the raw microscopy data [19
]. Using this screen then, a comparison can be made between these previous approaches and the method proposed here. The DENV screen has not been published before.
Cell-to-cell variability in RNA interference screens
We observed considerable cell-to-cell variability in both screens, even within the same spot. Figure shows microscopy images for a negative control (scrambled siRNA) and the biological positive control CD81, in the HCV screen. Two color channels are shown for the same spot: Cell nuclei were stained using DAPI (left images) and a GFP readout was used for viral infection of the same cells (right images). Variability between individual cells is shown in Figure , which plots the distribution of log-GFP signal intensity values of the single cell measurements of negative (top) and positive (bottom) controls. Of note, in both cases two populations corresponding to infected and non-infected cells can be observed, but the positive and negative controls differ in the distribution of cells between the two groups. The solid brown line shows the log-GFP intensity distribution of all cells in the entire screen (without controls). Importantly, the negative controls exactly fit the shape of the curve, whereas CD81 does not - indicating that the full screen can be used as a quasi-negative control, and that most knockdowns in the screen have no effect on viral infection. The number of cells in wells with positive versus negative controls is roughly the same (HCV: 326.82 ± 5.44 vs. 338.32 ± 3.43 mean ± std. error, p = 0.11 using two-sided, two-sample Welch's t-test; DENV: 279.81 ± 7.64 vs. 287.0 ± 5.3; p = 0.46), indicating no significant apoptotic effect of the corresponding siRNA knockdowns.
Figure 1 Signal intensity measurements of controls. (a) Microscopy images for negative (scrambled) and positive controls (CD81) in the hepatitis C virus infection screen. The left panel shows the nuclei (DAPI) channel, the right panel the virus signal (GFP). A (more ...)
To further characterize cell-to-cell variability, we fitted a Gaussian mixture model to normalized log-transformed intensity values. This fit shows a clear bimodal distribution of the data with approximately normally distributed components (μ = 0.34 ± 1.25*10-4 non-infected and μ = -0.5 ± 3.63*10-5 infected cells for HCV; μ = 0.21 ± 1.71*10-4 non-infected and μ = -0.72 ± 1.71*10-4 infected cells for DENV). Estimation of the mixture coefficients for the positive and negative controls confirm that there are clear differences. For the positive controls of HCV (DENV) there is an uninfected cell component probability of pu+ = 0.61 ± 0.002 (pu+ = 0.42 ± 0.002), and for the negative controls pu- = 0.4 ± 0.001 (pu- = 0.21 ± 0.001). This confirms that siRNA knockdowns of genes required for HCV and DENV infection or replication (dependency factors) have a higher proportion of cells with weak signal intensities. Clearly, under optimal conditions (perfect transfection, knockdown and infection efficiencies), only background GFP intensity for positive and maximal intensity for negative controls would be expected.
Population context influences infection
We next studied the effect of population parameters on hepatitis C virus and Dengue virus infection, using a procedure similar to the one proposed by Snijder et al. [18
]. Notably, while the analysis done by these authors considers the effect of population context on viral infection in the absence of any RNAi perturbation, we here present for the first time results including knockdowns. Furthermore, results shown here are based on chambered coverglass slides (LabTeks), whereas Snijder et al. used 96 well plates, with considerable differences in cell numbers and cell densities between these two platforms.
We computed six different population features for each cell based on the DAPI and GFP stains: (1) Size of the cell, (2) number of cells in the same spot, (3) location of the cell in a local population (center or edge of a local cell population), (4) local cell density and (5) shape of the cell nucleus (1/circularity of the cell nucleus). We note that since no stain for the cell cytoplasm is available, nuclear size is used as an approximation to cell size, as previously proposed by Snijder et al. We furthermore calculated four technical features: (1) location of the cell in the spot (center of spot or at the border), (2) row effects of cell location (median signal intensity of all cells in corresponding row on LabTek), (3) column effect of cell location (median signal intensity of all cells in corresponding column on LabTek), and (4) plate effect (median signal intensity of all cells on corresponding LabTek).
Due to the large number of individual cells, we grouped cells with similar properties into 20 discrete bins per feature. For each of the 5 population context and 4 technical features, we then computed the average within-bin and between-bin standard deviation of viral infection. The ratio of these two values provides a direct measure to assess the influences of the corresponding population or technical parameter on viral infection. A similar procedure has been used by Snijder et al to compute the population effects.
Figure shows the explained standard deviation of viral infection for each of the five population and four technical features we computed. The greatest influence on signal intensity both in HCV and in DENV was due to plate effects, with 14.35% (HCV) and 38.9% (DENV) of the standard deviation explained by this parameter. This was followed by cell size (12.96% vs. 16.24% explained variation) as the second most important parameter in both screens. Spatial effects on the LabTeks amounted to between 4.29% to 11.9% of the total variation. Individual population features explained between 1.8% to 16.24% of the total variation. We note that the overall variance was higher in the DENV screen (coefficient of variation of raw signal intensities CV = 0.595 DENV, in comparison to CV = 0.546 HCV).
Figure 2 Results of the population context and technical feature computations. (a) Explained standard deviation of cell population context and technical features with respect to single-cell viral infection efficiency. Shown are the ratios of the between-bin standard (more ...)
Figure shows the relative importance of the five population context parameters on HCV and DENV infection. In both viruses, cell size dominates with 42% (HCV) and 33% (DENV) of the variance due to population context. Cell number (24% HCV, 28% DENV) and cell position in local population (21% HCV, 20% DENV) are the second and third most important parameters. Local cell density (7% in both viruses) and cell shape (6% HCV, 12% DENV) both still play an important role, but relatively minor when compared to other factors. This is in contrast to the previous findings of Snijder and co-authors, who reported that the location of a cell at the edge or in the middle of a cell cluster is the main contributing factor for dengue virus infection [18
]. This difference may be due to different conditions in 96 well plates versus on chambered coverglass slides (LabTeks), and probably also due to different cell lines used in the two experiments (Huh7.5 vs. HeLa).
Single cell based normalization and hit selection
Figure gives the mean and standard deviation for each of the twenty bins of the two most important population context features on both screens (for the other two, non-binary, population context features see Additional file 2
). This shows, for example, that in the DENV and HCV screen, the log signal virus intensity is smaller the larger the cells. In contrast, the signal intensity is higher the greater the cell number. To test whether the effects of the individual features on the virus signal intensities are linear, we used a Harvey-Collier test for linearity computed on the log signal intensities and the raw features (without binning). The results show that all features are significantly nonlinear (p-values ≤ 2.2*10-16
for all features of the DENV and HCV screen, except for the Spot border feature of HCV with a p-value ≤ 2.987*10-7
and the Column feature of DENV with a p-value ≤ 1.22*10-4
Due to these nonlinear effects of the population context and technical features, we used multivariate adaptive regression splines (MARS) on the full data (without binning) to estimate the influence of the features on HCV and DENV infection [21
]. The fitted model was then used to correct raw intensity values, by subtracting estimated effects from individual cell measurements.
We next developed a statistical test to identify siRNA knockdowns showing a significant effect on viral replication, borrowing ideas from gene set enrichment analysis (GSEA) as proposed by Sweet-Cordero et al. [22
]. This procedure is essentially a Kolmogorov-Smirnov test on running sums over ranked intensity values, see methods [24
]. We are looking for siRNAs that lead to a shift in the distribution of individual cell signal intensities away from the distribution of negative controls, towards an increased number of non-infected cells showing only background fluorescence. The basic principle employed in this procedure is to sort all cells in the entire screen according to their measured viral signal intensity. Two running sums
are then computed, counting the fraction of cells treated or not treated with a particular siRNA Gk
of interest as the signal intensity is gradually increased (see Equation 1 and Equation 2, methods). The difference DIF
between these two values measures the enrichment of cells with respective intensity values in the knockdown of interest (Equation 3). Figure shows DIF
values of the positive and negative controls over the indices of the sorted cell intensity values of a randomly selected plate of the HCV screen, illustrating the clear differences observable between positive and negative controls. Whereas the DIF
value for the positive controls is increasing until a maximum of about 0.2 is reached and then decreases again, the DIF
value of the negative controls is fluctuating around zero. The enrichment score ES
is defined as the maximum deviation of DIF
from zero. Figure shows the median enrichment score ES
for each siRNA in the whole screen. The red curve illustrates the ES
sorted by increasing order.
Figure 3 Results of the ES compuations. (a) Computed DIF values for the positive (blue) and negative (red) controls of a randomly selected plate from the hepatitis C virus screen. Note that four positive and seven negative controls were spotted on this plate. (more ...)
The Figures and show the distribution of the enrichment scores ES of all siRNAs in the HCV and DENV screens. Scores of the positive controls are indicated by blue diamonds, and for the negative controls by red circles at the top of the plot. Interestingly, while the positive and negative controls are perfectly separated for the HCV data, some of the positive controls in the DENV screen are not working properly. Since other quality indicators of the affected plates in the DENV screen were fine (correlation between replicates, other controls on the same plates, statistics of plate and location effects) but we generally observed higher variability in the DENV screen, we decided not to remove the full plate for the affected controls.
An interesting observation is the three peaks of the distribution in Figures and , which seems counter-intuitive. This tri-modality comes from summarizing the replicates using the median of the ES of the replicate measurements. Since there is an even number of replicates (twelve for HCV and six for DENV) the mean of the two central elements is used as median, and gives a value around zero, if exactly one half of the replicates has positive and the other half has negative ES values. The siRNAs that have the majority of replicates with positive or negative ES value occur in the right or left peak, respectively. This tri-modality effect is thus an artifact of summarizing an even number of replicates using the median, and does not occur when taking the mean - which however is less robust to outlier siRNAs.
The ES were then subjected to a nonparametric statistical test to identify gene knockdowns that significantly decrease viral replication, using a significance level of adjusted p-values of α = 0.05 and an ES larger than 1.5 times the standard deviation of the median over the replicates for the calculated ES of each LabTek.
Hepatitis C virus host dependency factors
Using the normalization for cell population context and subsequent hit scoring as described above resulted in a list of 54 host dependency factors significantly reducing HCV replication (see Additional file 3
). We compared identified hits to results obtained using a statistical analysis based on averaged intensity values per well (AVERAGE), as previously published [17
]. A z-score threshold of 1.5 and a significance level of 5% were used for hit identification in the AVERAGE method. We furthermore compared results with the analysis method based on Ripley's k function recently proposed by Suratanee and coauthors [19
], with significance level 5% and negative clustering scores (RIPLEY).
Surprisingly, there are considerable differences in identified hits, with only 6 gene knockdowns overlapping between all three methods, compare Figure (upper VENN-Diagram). Genes in the overlap consistently identified by all approaches are the positive controls HCV321
directly targeting the viral RNA genome, and CD81
, the main entry receptor required by HCV. The remaining three overlapping hits are phosphatidylinositol 4-kinase alpha (PI4KA
], casein kinase II subunit alpha (CSNK2A1
], which are known to be related to the HCV replication cycle and fms-related tyrosine kinase 4 (FLT4
) which has been suggested to play a role in HCV in earlier publications [19
]. Out of the 44 genes identified using the AVERAGE approach, 34 genes were also found using the cell-based method presented here (CELL-BASED), accounting for an overlap of 77.3% and indicating high agreement between CELL-BASED and AVERAGE results. On the other hand, only 10 of the 30 genes identified using the RIPLEY approach could also be confirmed with the other methods (33.3%), and only 8 out of 44 genes identified using AVERAGE were also found by RIPLEY (18.2%).
Figure 4 Results of the different analysis methods on the HCV data. (a) Upper plot: Venn-Diagram of the hits at the gene level using the CELL-BASED, AVERAGE and RIPLEY analysis methods. A total of 30 host dependency factors were identified using RIPLEY, 44 host (more ...)
Most dependency factors were identified using the CELL-BASED approach (54) (AVERAGE: 44 and RIPLEY: 30). While we expect higher statistical power when using individual cell data, the question arises how reliable and reproducible identified hits are. We therefore first evaluated how reliably the positive and negative controls were identified with the three approaches, by computing sensitivity and specificity and analyzing receiver operator characteristic (ROC) curves for all three analysis methods based on the individual control spots on each plate. ROC plots show tradeoffs between sensitivity and specificity for different thresholds on scores or p-values used for hit-calling. Random guessing would correspond to a diagonal line in the ROC plot, whereas a perfect predictor would yield sensitivity and specificity values of one. ROC curves can be summarized further by computing the area under the curve (AUC), which is a value between 0.5 (random guessing) and 1 (perfect classification). Figure shows ROC curves for AVERAGE, RIPLEY and CELL-BASED. Although all ROC curves are significantly better than random guessing, AUC values show that CELL-BASED outperforms the other two approaches in scoring controls almost perfectly (CELL-BASED: 0.989, AVERAGE: 0.95, RIPLEY: 0.87). Results are superior in recognizing positive and negative controls in terms of both sensitivity and specificity of. For the AVERAGE method a loess normalization was used to normalize for general trends between the mean viral signal intensities and the number of cells within one spot. Furthermore, b-score normalization was used to normalize against spatial plate effects. The AUC values show, that the AVERAGE method cannot minimize the introduction of false positive and false negative controls on a single spot level as well as the CELL-BASED method.
Since our analysis approach consists of two independent methods (normalization against the population context features using MARS and the statistical test based on the idea of GSEA) we assessed which of the two methods contributed the most to the increased performance of classifying the controls. To do this, we used each of the two methods independently on the HCV data. For the first method (MARS-ONLY) we took raw log virus signal intensity values and normalized them against the features. Then, we used RNAither [11
] to summarize the cell intensities of one spot using their mean and computed z-scores for each spot. By applying a threshold of 1.5 times the standard deviation of the z-scores of each replicate we defined hits on an individual spot level. This procedure thus exploits the single cell information for normalization, but nor for hit-calling. We note, that the calculation of significance levels for each spot is not possible in this method. For the second method (GSEA-ONLY) we calculated ES
on the raw log virus signal intensities and used the nonparametric statistical test based on permutations to calculate p-values for each spot - exploiting the full information of the individual cells in the statistical test. We used Bonferroni corrected p-values ≤ 0.05 and ES
≥1.5 times the standard deviation of median summarized replicate ES
for finding individual spots which significantly decrease viral replication. We performed for both methods a ROC analysis on individual control spots and computed AUC values. MARS-ONLY results in an AUC of 0.971 and GSEA-ONLY in an AUC value of 0.987.
In summary, both individual methods are able to yield improved results in comparison to the AVERAGE (AUC = 0.95) and RIPLEY (AUC = 0.87) methods. The combination of MARS-ONLY and GSEA-ONLY in our CELL-BASED approach gives the best result when compared to the stand alone methods.
In addition, we summarized replicate measurements of MARS-ONLY by taking their median and used a two-sample, two-sided Welch's t-test to define significance values for the individual siRNAs. We used an alpha threshold of 0.05 on uncorrected p-values and 1.5 times the standard deviation of z-scores to define significant siRNAs. The same was done for GSEA-ONLY, although p-values for individual siRNAs have been calculated based on the ES using the nonparametric test. Of the resulting hits for GSEA-ONLY, MARS-ONLY and CELL-BASED there are overlapping genes 35 (see Figure lower VENN-Diagram and S1). Among the 54 hits found using the combined CELL-BASED method 47 (87%) were also found with the two independent methods.
Genes previously identified by Reiss et al. using the AVERAGE method were tested in a secondary validation screen by the same authors. We used this screen to further assess our method. Using the subset of genes tested in the validation screen, we computed sensitivity and specificity for varying z-score thresholds for the validation screen, varying ES score thresholds and adjusted p-values ≤ 0.05 for selecting hits of the CELL-BASED method and varying p-values and negative clustering scores for the RIPLEY method. It can be noted that an evaluation of the AVERAGE method with this data is not possible, since this approach was used to select genes for the validation screen, and hence true and false negatives cannot be computed for AVERAGE (negatives were not included in the validation screen). Figure shows the AUC values over different increasing z-score thresholds on the validation screen, for CELL-BASED and RIPLEY. Again, CELL-BASED shows superior sensitivity and specificity (results of GSEA-ONLY and MARS-ONLY look similar to CELL-BASED and are thus not shown).
Pathway analysis and functional annotation of host dependency factors
We identified 54 host dependency factors (HDF) for HCV and 57 HDFs for DENV, see Additional files 3
. These were further mapped to KEGG and Biocarta, and functional enrichment tests were carried out to select significantly enriched processes using DAVID http://david.abcc.ncifcrf.gov/
, see Additional files 5
. We identified 20 pathways with a p-value smaller than 0.05 to be significantly involved in HCV using the hits identified with our CELL-BASED approach. This is ten times the number of pathways which are identified to be significantly enriched using the hits of the AVERAGE method. In this method, only two pathways were found, namely Purine metabolism, which was also identified with the CELL-BASED method, and Axon guidance. Using the hit list of the RIPLEY approach no pathways could be identified as significantly enriched. This clearly shows, that our approach results not only an increased sensitivity and specificity on an individual siRNA level, but also an increased sensitivity on the pathway level.
Enriched pathways for HCV include endocytosis, focal adhesion, signaling in the immune system, regulation of the actin cytoskeleton, and the ErbB
kinase signaling pathways. All of these pathways have previously been reported by Reiss and colleagues [17
] by pooling their screen with other previously published screens. Notably, with the approach presented here, we identified the same pathways without the additional information of other screens, again indicating higher sensitivity of the CELL-BASED approach. Several additional pathways were also identified, including purine metabolism, TLR
signaling and several cancer-associated pathways.
Additional file 6
shows enriched pathways of the DENV screen. Although the overlap between HCV and DENV HDFs at the gene level is only seven genes, significant overlap can be observed at the level of pathways. Enriched pathways again include focal adhesion, immune signaling, and the ErbB
kinase pathways, reflecting the close evolutionary relationship of the two viruses.
Population context of non-virus screens
Apart from virus infection, the impact of many other- if not all- "bioactive" treatments on cells most likely depends on the properties of the individual cells and their context. A pristine and ubiquitous example is the transfection of cells using liposomal reagents. Every wet lab biologist will know intuitively, that efficiency of transfection strongly depends on the confluency of the culture. While this is true and relevant for bulk measurements of larger formats, it becomes critical when dealing with the low cell counts typically found in the wells or spots of high-throughput screening formats. In an imaging-based screen of an innate immune signaling pathway, which shall be published elsewhere, cells were again reverse transfected with siRNA on spots of LabTek chamber slides. To determine the impact of gene silencing on the pathway under investigation, signaling was triggered by transfection of the cells with a defined stimulus and a few hours later, pathway activation in individual cells was assessed by microscopy of a fluorescent reporter. Across the whole screen (ca. 2.4 Mio. cells were analyzed), we could detect a strict correlation between population context and the rate of pathway activation, due to the vastly different susceptibility for liposomal transfection among cells growing in different micro-contexts. To quantify this we calculated the explained standard deviation of the rate of pathway activation by the population context features for each plate. The mean and standard deviation across replicated plates of the individual population features are for Cell Size: 8.1 ± 2.57, Cell Density: 9.24 ± 3.5, Cell Number: 8.8 ± 2.4, Cell Shape: 8.16 ± 2.75 and for Population Border: 8.9 ± 8.3. The correlation of each cell's local density for example, was already observable by the human eye and therefore had to be normalized in order to perform hit-calling.
The technical features average to 6.1 ± 1.9 (Row Effects), 4.4 ± 2.1 (Column Effects) and 4.5 ± 2.71 (Spot Border). Analysis has been done on the individual plate level and thus, for the feature addressing the plate effect, no binning has been performed. We calculated AUC values for the positive and negative controls given in the screen after normalizing against the population context and against the technical features, and applied our approach for statistical hit scoring as presented in this study. We received increased performance (AUC = 0.66) in comparison to an analysis without normalizing against the features (AUC = 0.58).