Twenty-five years after its discovery
, human immunodeficiency virus type 1 (HIV-1) still has a major health and socioeconomic impact, particularly in developing countries. Approximately 33 million people worldwide are infected with HIV-1, with an estimated 2.5 million new infections and 2.1 million AIDS-related deaths occurring each year.1
The HIV-1 genome expresses 17 processed proteins: six structural proteins, matrix (MA), capsid (CA), nucleocapsid (NC; p7), p6, p2, and p1 derived from viral protease-mediated cleavage of the HIV-1 Gag polyprotein; three enzymes, reverse transcriptase (RT), which has both RT and ribonuclease H (RNase H) enzymatic activities, protease (PR), and integrase (IN) derived from cleavage of the HIV-1 Pol polyprotein by PR; two envelope glycoproteins, gp120 (SU) and gp41 (TM), derived from cleavage of the HIV-1 Env (gp160) protein by cellular proteases; two regulatory proteins, transactivator (Tat) and regulator of viral protein expression (Rev); and four accessory proteins, negative factor (Nef), viral infectivity factor (Vif), viral protein R (Vpr), and viral protein U (Vpu). Currently, FDA-approved anti-HIV drugs (http://www.fda.gov/oashi/aids/virals.html
) target only a limited number of proteins: RT, PR, IN, gp41, as well as the cellular protein CCR5, which, along with the CD4 receptor, serves as a co-receptor for HIV-1 entry into host cells.2
In addition, no HIV-1 vaccine has shown efficacy, in spite of considerable efforts directed in this area. Vaccines under investigation mainly rely on immune responses to HIV Gag, Pol, and Env, but often include regulatory (Rev and Tat) or accessory proteins (Vpr, Nef, and Vif).3
To make progress in understanding HIV-1 replication, and the mechanisms of viral pathogenesis, it is important to understand the interactions of HIV-1 proteins with the vast array of human cellular proteins exploited for virus replication.4
While these interactions can be direct viral-host cell protein-protein interactions, many are indirect, such as the regulatory interactions that alter expression of a human gene. An in-depth review and comprehension of these interactions enhance insight into HIV pathogenesis on a cellular level and are also essential for focusing efforts on new drug targets and for better understanding vaccine immune responses. Since 1984 the HIV/AIDS research community has published extensively on virus-host interactions, but unlike HIV sequence, resistance, and immunology data (http://www.hiv.lanl.gov
), no searchable database had been developed to efficiently retrieve information relevant to interactions between HIV-1 and human proteins. To fill this important gap, the Division of AIDS of the National Institute of Allergy and Infectious Diseases initiated the development of an HIV-1 Human Protein Interaction Database that serves as a freely accessible, continuously updated, and comprehensive resource for the scientific community. The goals of this report are to describe the development of the database, summarize its contents, and present a visualization of the catalogued data demonstrating the complexity of the HIV-1 human protein interaction network.
To facilitate the development of a database describing HIV-1 and human protein interactions, the annotation on the HIV-1 reference sequence (RefSeq nucleotide accession number NC_001802.1) was first improved and expanded. RefSeq5
is a database of nucleotide and protein reference sequences annotated and distributed by the National Center for Biotechnology Information (NCBI). The HIV-1 reference sequence was updated in order to provide functional information on the HIV-1 polyproteins and mature peptides and to use standard names. Explicit representation of each mature peptide product supports more specific and accurate cataloguing of the interactions. All references within the database are catalogued using this primary reference sequence and the associated protein accession numbers. Subsequently, literature indexed in PubMed was searched using keywords for each of the HIV-1 proteins [e.g., “HIV and (matrix or p17)”]. The article titles and abstracts that were retrieved were reviewed to identify papers describing interactions between HIV-1 and human proteins. Relevant publications were collected and used to manually catalogue interactions into a database containing the (1) protein names, (2) Entrez GeneIDs (species-specific gene identification numbers from Entrez Gene, NCBI's database for gene-specific information),6
(3) RefSeq protein accession numbers, (4) brief descriptions of the interactions, (5) PubMed identification numbers (PMIDs) of articles describing the interactions, and (6) keywords for searching the interactions. Since the articles were from peer-reviewed publications, all identified interactions were incorporated into the database without placing additional judgment on the scientific validity of the report. If two different reports had conflicting conclusions, a comment describing the data ambiguity was incorporated into the interaction descriptions. For the majority of publications the full-text paper was reviewed, but some interactions were catalogued based on abstract only (e.g., when abstracts contained complete descriptions of the interactions or full copies of the articles could not be obtained). Upon completion, the searchable database was uploaded into the HIV-1 protein interactions section of NCBI's Entrez Gene at http://www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions
. The database is cross-referenced to other NCBI resources7
through Entrez Gene, providing a discovery space for searching and understanding the interaction data. Continued cataloguing of interactions and updating of the database to include information from new articles are performed on a regular basis.
The HIV-1 human interaction data can be accessed at NCBI through three mechanisms: (1) Entrez Gene, (2) the dedicated web site, or (3) by file transfer protocol (FTP). Records in Entrez Gene that contain HIV-1 interaction data can be retrieved with the query: “hiv1interactions”[Properties] AND “Homo sapiens”[Organism]. The “Table of Contents” for each gene record provides a link to the section entitled “HIV-1 protein interactions” where data are presented as a table. For human gene records, the table includes a link to the gene record for the HIV-1 protein, the interaction comment, and links to the PubMed citation(s) supporting the comment. On HIV-1 records, the table reports the interaction keyword, provides a link to the corresponding human interactant gene record, and links to the supporting PubMed citation(s). The “Links” menu for each gene record also allows users to readily navigate from Entrez Gene to other NCBI resources to find additional data related to the interacting proteins. For example, gene expression data can be retrieved via the link to “GEO Profiles.”8
Similarly, all gene records that contain HIV-1 interaction data and that also have gene expression data in the Geo8
database can be identified in Entrez Gene using the query: “hiv1interactions”[Properties] AND gene_geo[filter]. Many other queries can be designed to retrieve and study similar subsets of the database. For instance, using specific Gene Ontology9
(GO) terms, queries can be built to find all HIV-1 interacting proteins that are located in the cytoplasm (e.g., “hiv1interactions” [Properties] AND cytoplasm[go]) or that are involved in stem cell maintenance (e.g., “hiv1interactions”[Properties] AND “stem cell maintenance”[go]). Furthermore, in addition to Entrez Gene, the HIV-1 human protein interaction database web site lists all interaction types described per HIV-1 gene. Reports include the HIV-1 protein, the interaction type, and the human protein (linked to the gene record). Report pages include a query interface that facilitates accessing gene records that have a specific type of interaction (such as activation) with a specific HIV-1 protein. Results from these queries return as a list with links to the gene records. The report interface also provides an option to download all, or a subset, of the data in a columnar text format. Finally, the complete dataset can be transferred by FTP from ftp://ftp.ncbi.nih.gov/gene/GeneRIF/
as the file hiv_interactions.gz
For the entire database, over 100,000 journal abstracts published between 1984 and 2007 were identified by PubMed queries and further reviewed, leading to the identification of 3200 papers describing putative interactions between HIV-1 and human proteins. summarizes the HIV-1 protein interactions catalogued from these papers (see supporting online materials for a listing of all interactions).10
A total of 1448 human proteins that interact with HIV-1, comprising 2589 unique HIV-1-to-human protein interactions, were identified. In addition, 5135 summary descriptions of the interactions were generated, with a total of 14,312 PMID references to the original articles that reported the interactions. Sixty-eight unique keywords (directional from HIV-1 protein to human protein) are associated with these descriptions. Keywords were selected based on the text in the original journal articles by identifying the most important functional keyword used by the authors to describe the interaction. Whenever possible, similar keywords were combined. For example, “downregulates” and “downmodulates” were combined to use the single keyword “downregulates.” The most pervasive keywords used in the database are “interacts with,” 17.0%; “upregulates,” 11.9%; “binds,” 11.7%; “activates,” 9.7%; “downregulates,” 7.8%; “inhibits,” 6.9%; “inhibited by,” 4.1%; “processed by,” 2.6%; “regulated by,” 2.1%; and “phosphorylated by,” 1.5% (see supporting online materials for a listing of all keywords).10
While it cannot be excluded that some of these interactions are nonspecific or human-prone errors, nevertheless, 58% of the interactions were confirmed by more than one publication. Collectively, the catalogued interactions provide a unique collection of data generated from the available scientific literature.
Summary of the HIV-1 Human Protein Interaction Database Contents
To demonstrate the complexity of the HIV-1 human protein interaction network, the catalogued data were visualized using InterView11
and Gene Ontology9
(GO) terms (). The Gene Ontology is a set of three structured controlled ontologies that describes gene products in terms of their associated cellular component (), biological process, or molecular function in a species-independent manner. GO terms were collected from Entrez Gene for each HIV-interacting human protein (see supporting online materials for alternative visualizations based on biological processs and molecular function GO terms, and for distribution of interactions based on GO terms).10
Proteins without GO terms in the three ontologies were annotated as unknown by using the root ontology term (cellular component, 14%; biological process, 9.5%; and molecular function, 7.8%). These visualizations reveal the extent to which HIV-1 interacts with diverse human proteins and demonstrate there are many examples of the virus interacting with human proteins that are part of the same functional category. Interestingly, the majority of interactions, 68%, are indirect (e.g., altered expression of a human protein), while only 32% are direct physical interactions (e.g., binding) (). In addition, 529 (37%) of the human proteins in the database were found to interact with more than one HIV-1 protein. For example, the signaling protein mitogen-activated protein kinase 1 (MAPK1) has a surprising range of interactions with 10 different HIV-1 proteins. MAPK1 is a member of the MAP kinase family involved in a wide variety of cellular processes such as proliferation, differentiation, transcription regulation, and development. Thus, it is likely that MAPK1 is intimately involved in many steps of the HIV-1 replication cycle. Similarly, mitogen-activated protein kinase 3 (MAPK3), protein kinase C-alpha (PRKCA), and interferon-gamma (IFNG) have been described as interacting with nine different HIV-1 proteins each, suggesting these proteins also play important roles in HIV-1 replication and pathogenesis.
FIG. 1. Visualization of the HIV-1 human protein interaction network. The network was visualized with InterView.11 Gray ovals represent HIV-1 proteins. Colored circles represent human proteins and are shown clustering around the HIV-1 protein they interact with. (more ...)
summarizes the distribution and number of interacting HIV-1 proteins per cellular protein (see supporting online materials for a comprehensive listing of the specific HIV-1 proteins interacting with each cellular protein).10
Moreover, large numbers of interactions were published for the HIV-1 regulatory protein Tat, as well as for the envelope proteins: 30% and 33% of the total interactions identified, respectively. Of particular note is the cataloguing of 273 different nuclear proteins that interact with Tat, including over 40 transcription factors and regulators. Similarly, 219 extracellular and plasma membrane proteins were identified as interacting with HIV-1 gp120 and 67 were identified as interacting with gp41, including over 70 cellular receptors, integrins, and adhesion molecules. Overall, the database catalogues a wealth of information that can be mined to understand better the breadth of the HIV-1 human protein interaction network.
Number of Interacting HIV-1 Proteins per Cellular Protein
In conclusion, the HIV-1 human protein interaction network forms the basis of a detailed map for tracking the cellular interactions that drive HIV-1 replication and pathogenesis. Integration of the database into NCBI's online resources7
) has significantly expanded the gene-specific information reported and provides a discovery space to the scientific community for researching and better understanding the impact of HIV-1 on the cell. Furthermore, efforts are underway to incorporate these data into Canada's Biomolecular Object Network Databank (BOND) (http://bond.unleashedinformatics.com
), a database cataloguing the interactions between all cellular proteins, which will provide a powerful tool for advancing the available knowledge of how HIV-1 replicates in the context of the whole cell and for broadening our understanding of how these interactions control HIV-1 replication and mediate pathogenic effects on infected cells. This set of known interactions between HIV-1 and human proteins is the first step toward new strategies for inhibiting HIV-1 replication (e.g., targeting of nonessential host-cell proteins).12
As an example of how this resource has already had a significant positive impact on HIV research, Brass et al.13
recently published a study describing the use of an RNAi screen that identified a set of 273 cellular proteins involved in HIV-1 replication, which they termed HIV-dependency factors (HDFs).14
The HIV-1 Human Protein Interaction Database was used extensively by the authors to support the categorization and analysis of the HDFs identified through this research. Similarly, a subset of the HIV-1 Human Protein Interaction Database that has already been uploaded into the Biomolecular Interaction Network Database15
(BIND; predecessor to BOND) supported the work of Dyer et al.
who performed a broad analysis of human proteins interacting with viruses and other pathogens. The future integration of the human protein interaction network,17
cell-type, disease stage, and other factors will permit better insight into our understanding of the HIV-1 human protein interaction network and will help accelerate the development of more effective antiviral and vaccine interventions.