The application of high-throughput, unbiased, "systems" approaches to study host-pathogen relationships is facilitating a shift in focus from the pathogen to the response of the host during infection. A more global view of the physical, genetic and functional interactions that occur during infection will provide a deeper insight into the regulatory mechanisms involved in pathogenesis and may eventually lead to new cellular targets for therapeutic intervention.
Currently, the vast majority of host-pathogen physical interaction data involves HIV, for which a large amount of physical binding information has historically been available, mostly from small-scale, hypothesis-driven experiments [1
]. For example, the HIV-1 Human Protein Interaction Database (HHPID) maintained by NIAID contains over 2500 functional connections between individual and human proteins observed over 25 years of research, approximately 30% of which are classified as physical binding interactions [2
]. Another database, VirusMINT [3
], contains a collection of literature-curated physical interactions for several viruses, the vast majority corresponding to HIV-1.
Several large-scale, systematic studies using the yeast two-hybrid methodology have recently been performed for several important human pathogens, including hepatitis C [4
], Epstein-Barr [5
], and influenza [6
] viruses. Other approaches, such as those using Protein-fragment Complementation Assays (PCA) [7
], protein arrays [8
], or affinity tagging/purification combined with mass spectrometry (AP-MS) [9
], which have been successfully used in other systems [10
], have not been exploited to systematically interrogate host-pathogen physical relationships. We have, however, recently carried out the first systematic host-pathogen AP-MS study targeting HIV-1 using two different cell lines (HEK293 and Jurkat) (Jager et al., submitted), which will further increase the need for tools to visualize and integrate host-pathogen interaction datasets.
In addition to physical interaction studies, functionally important factors in HIV biology have also been identified by genetic or proteomic profiling screens. These studies do not necessarily identify physical binding partners for pathogenic proteins, but rather often implicate pathways or indirect "functional" associations. In 2008, three separate siRNA screens were published (Brass, Konig, and Zhou datasets) [14
] that identified host genes required for efficient HIV infection. More recently, an additional RNAi screen was carried out using shRNAs in a potentially more physiologically relevant Jurkat cell line (Yeung dataset) [17
]. RNAi studies in mammalian cells are also giving new insights into the host response to a number of other pathogenic organisms, including hepatitis C [18
], influenza [20
], West Nile [24
], and Dengue fever viruses [25
Similarly, several mass spectrometry-based studies examined protein expression levels in HIV-infected and uninfected cells. For example, Speijer and colleagues [26
] used a 2D-DIGE approach in the human T-cell line PM1 where protein expression was measured following HIV infection. Another study examined protein abundance changes in a CD4 cell line 36 hours post-infection [27
], whereas the most recent study reports on global protein level changes in primary CD4 cells isolated from five donors [28
], profiling proteomic changes post infection in a time-dependent fashion.
At the most basic level, there exist two different types of data (physical vs. functional) and they both provide different insights into molecular mechanism. For example, genetic and proteomic profiling screens probing HIV-human interactions provide a wealth of data on genes and processes that contribute to pathogenesis but do not necessarily reflect direct physical connections. Conversely, methodologies that probe for physical interactions often miss crucial functional connections. Therefore, poor overlap is often seen when comparing datasets derived from these different, but complementary platforms. However, even a comparison of datasets collected using the same technology can reveal a very low overlap. For example, although the initial HIV RNAi screens each identified approximately 300 genes [14
], there was a small (albeit statistically significant) overlap of three factors [29
]. Several reasons contribute to this lack of concordance, including differences in the cell types (e.g., HeLa vs. HEK293T), the RNAi approaches and libraries used, as well as the phenotypic effects that were monitored. A comparison of all four genetic screens, which includes the most recent dataset derived from Jurkat cells using an shRNA library [17
], finds no common factor between them (Figure ). In fact, only seven of 252 genes in this dataset are shared with even one of the other genetic screens (p = 0.654). Similarly, proteomic profiling datasets shared a low number of proteins (three) among all three datasets, although this is still statistically significant (p < 10-5
, Figure ).
Figure 1 Numerous host factors have been identified for HIV by small-scale and high-throughput experiments, with little overlap between the various sources. (A) Venn diagram shows overlap from four HIV-based genetic screens [14-17]. Only three intersections show (more ...)
In cases where multiple types of data are available, it has been extremely illuminating to combine the diverse datasets to identify common pathways, processes, and complexes. For example, one recent study combined genetic and physical interaction data to identify new regulators of Wnt/β-Catenin signaling in mammalian cells [31
]. Another study carried out a meta-analysis of several host-HIV-1 datasets, integrated with host protein-protein interaction databases, and reported significant overrepresented clusters within a network of host-pathogen and host-host interactions as important functional modules involved in virulence [29
]. Another recent study identified key processes and host cellular subsystems impacted by HIV-1 infection by analyzing patterns of interactions in the HHPID, in combination with functional annotation and cross-referencing to global siRNA data [32
In order to facilitate integration and exploration of the vast number of HIV-human interactions from different databases and data types, we have created a tool, termed GPS-Prot, with access to all major HIV-1 and human interaction databases as well as an option to overlay functional data (e.g. genetic interactions), which requires only very basic user input to produce an integrated network. To our knowledge this is the first tool to combine comprehensive HIV-1 and human physical/functional interaction data with a graphical viewer and web interface. Users can thus apply the GPS-Prot platform as a "global positioning system" to visualize any human-HIV-1 interaction in the context of its landscape of reported binding partners. We have also implemented a feature for users to securely upload and view their own datasets of interest. This software uses a unique graphical interface based on TouchGraph LLC's Navigator program, which has been used for social networking applications and which makes navigating and gathering information from large networks intuitive and rapid. We therefore suggest that GPS-Prot is ideal for a novice user to quickly and easily build human-HIV-1 interaction networks from the wealth of published information, or from a user's own dataset, and to expand the network around a particular protein of interest.