The three siRNA screens (lists 1–3) together called 842 genes as diminishing HIV replication when knocked down, or 3.3% of all human protein-coding genes (
Report S1, pp. 98–120). A total of 34 genes were called in at least two siRNA screens (). Three genes were called in all three screens (MED6, MED7, and RELA). The pairwise overlaps were statistically significant (
p<0.024 for all pairs of screens), but the percentages of shared genes were quite modest, ranging from 3% to 6%. The Brass et al. and Zhou et al. screens (lists 2 and 3) both surveyed the entire HIV life cycle and studied infection in HeLa cells, and these two share the greatest overlap (6%). The three siRNA screens identified the NCBI genes as 13.3%–18.3% of the total, indicating highly significant enrichment (
p<0.001), as reported previously.
| Table 3Genes Called in at Least Two siRNA Screens. |
We then asked whether further enrichment relative to the NCBI HIV interaction database was achieved by examining human genes identified in at least two siRNA two screens. Of the 34 genes on two or more lists, 11 were previously reported in the HIV interaction database (NUP153, CCNT1, CTDP1, CHST1, CD4, CXCR4, TCEB3, JAK1, AKT1, DDX3X, and RELA), or 30% of the total, substantially higher than the 13%–18% identified in each single list alone. From this we infer that the newly identified genes called in two or more siRNA screens () are more likely to be authentic new cellular cofactors for HIV infection. Twenty-nine out of the 34 genes were found to be expressed in cells or tissues expressing CD4 and coreceptor by transcriptional profiling analysis, and so competent for HIV entry. Of the remaining five, CCNT1 (cyclinT1) is known to be expressed in T cells and represents a false negative call in the expression data used. A comprehensive table of all genes identified in pairwise combinations of lists 1–12 is provided at the end of
Report S1 together with the expression analysis (pp. 72–98).
Why did the three different siRNA screens yield such different gene lists? One possible explanation could be that the expression of host cell factors differed between the HeLa and 293T cells studied. However, analysis of transcriptional profiling data showed that >93% of the genes called as important for HIV infection in any one of the three studies were expressed in both cell types.
However, variation due to 1) experimental noise, 2) timing of sampling, and 3) different filtering criteria likely do explain some of the differences. Two replicates were available for analysis from the König et al. screen, allowing estimation of the variance. From this, the expected overlap for of the top 300 genes in replicate screens could be simulated. A test of two replicates or ten replicates per screen () yielded 150 or 240 overlapping genes, illustrating how the high variance reduced the overlap, but replication improves it.
A second source of variation was differences between time points analyzed, which varied among the published siRNA screens. Although data were not available for multiple time points for the HIV screens, data were available for a screen of influenza virus infection at three time points (S. Chanda, unpublished data). Analysis demonstrated that variation between time points was of the same magnitude as variation within time points and partially independent.
A third source of variation is likely to be differences in the filtering thresholds used. We investigated the effects of different choices for the toxicity filter by reanalyzing the data of the König et al. screen using three different toxicity thresholds. In the first, no filter was applied (100% of genes were accepted for further analysis), in the second, only genes in the 50% least toxic group were considered, in the third, only the 20% least toxic genes were considered. For each set, the 300 genes with the strongest reduction in HIV infection after knockdown were extracted and the overlap among sets compared (). Fewer than 150 genes out of 300 overlapped between the 100% and 20% sets, and the maximum between any pair was 222 genes, indicating that the final gene set called is very sensitive to the toxicity threshold chosen.
Thus variations between replicates, between time points, and in filtering thresholds all likely contributed to the differences between siRNA screens. Further differences also likely arose from use of different siRNA libraries, cell types, and viral strains
[5].
We next asked whether the three siRNA screens yielded host factors participating in similar cellular processes. We extracted overrepresented functional clusters for each screen (lists 1–3) using The Database for Annotation, Visualization, and Integrated Discovery (DAVID) tool
[15]. Overrepresented clusters were then filtered for significance (
p<0.06 based on a geometric mean of all terms in a group), redundancy, biological relevance, and specificity. This yielded 24 functional groups, most containing contributions from factors identified in two or more screens (; genes are cataloged in
Table S1). For example, all three screens were enriched in factors involved in “Nuclear Pore/Transport” (21–24 genes each), which likely facilitate the trafficking of HIV complexes between cellular compartments, including the nuclear import of the HIV pre-integration complex, export of viral RNAs, or possibly synthesis of other required factors. Other functional classes or complexes identified included “DNA Repair”, “Ubiquitin-associated”, “Mediator Complex/Transcription”, “RNA Binding”, “GTP Binding”, and “Helicase”. Thus, a functional analysis of the three screens revealed greater overlap in gene ontology (GO) categories than was seen for individual genes.