|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: CHC. Performed the experiments: KCC CHC. Analyzed the data: KCC TYW CHC. Contributed reagents/materials/analysis tools: CHC. Wrote the paper: KCC CHC.
AIDS is one of the most devastating diseases in human history. Decades of studies have revealed host factors required for HIV infection, indicating that HIV exploits host processes for its own purposes. HIV infection leads to AIDS as well as various comorbidities. The associations between HIV and human pathways and diseases may reveal non-obvious relationships between HIV and non-HIV-defining diseases.
Human biological pathways were evaluated and statistically compared against the presence of HIV host factor related genes. All of the obtained scores comparing HIV targeted genes and biological pathways were ranked. Different rank results based on overlapping genes, recovered virus-host interactions, co-expressed genes, and common interactions in human protein-protein interaction networks were obtained. Correlations between rankings suggested that these measures yielded diverse rankings. Rank combination of these ranks led to a final ranking of HIV-associated pathways, which revealed that HIV is associated with immune cell-related pathways and several cancer-related pathways. The proposed method is also applicable to the evaluation of associations between other pathogens and human pathways and diseases.
Our results suggest that HIV infection shares common molecular mechanisms with certain signaling pathways and cancers. Interference in apoptosis pathways and the long-term suppression of immune system functions by HIV infection might contribute to tumorigenesis. Relationships between HIV infection and human pathways of disease may aid in the identification of common drug targets for viral infections and other diseases.
Acquired immunodeficiency syndrome (AIDS) is a devastating disease that has afflicted the human species for decades. Despite the enormous amount of effort and resources devoted to its study, a cure for AIDS has not yet emerged. AIDS is caused by human immunodeficiency virus (HIV). Similar to other diseases caused by pathogens, various human pathways must be perturbed or even hijacked to serve the purposes of the HIV virus. Indeed, hundreds of human host factors have been identified as necessary during viral infection and replication –. Thousands of protein-protein interactions between HIV and human host proteins have been reported in the literature .
Certain diseases are known to be associated with HIV infection. For example, the association between HIV/AIDS and lymphoma/Karposi's sarcoma has been recognized since the discovery of HIV . Tuberculosis, hepatitis B/C, and other diseases are known comorbidities of HIV infection , , and HIV infection is even associated with neurocognitive disorders . These findings have led us to enquire into the human pathways and diseases that are associated with AIDS and the molecular mechanisms behind these associations.
Previous research has attempted to elucidate host-pathogen interactions through protein-protein interactions. Interactions between human proteins and several pathogens, including Hepatitis C virus , Epstein-Barr virus , influenza virus , and several strains of bacteria , were identified systematically. These studies suggested that interactions between humans and pathogens (viruses or bacteria) are extensive and prevalent. Several studies have also attempted to identify human biological processes that are influenced or perturbed by viruses , . These studies depicted human-pathogen interactions from a global perspective by pooling interactions with different pathogens and identifying common mechanisms playing important roles in viral and bacterial infections. One study specifically analyzed the interactions between HIV-1 and human proteins  and found that HIV targeted proteins that were not involved in human diseases listed in the Online Mendelian Inheritance in Man (OMIM).
To study the functional enrichment of genes (the association of genes with a specific function or pathway), gene set enrichment analysis (GSEA) and its derivatives are widely adopted , . In GSEA, genes are ranked by their correlations with phenotypes and an enrichment score (ES) is calculated to estimate whether genes from a gene set are clustered in the extreme regions (the bottom or top) of the ranked list. Some studies have applied GSEA to network/pathway analysis as well. For example, proteins in a protein-protein interaction network can be ranked by their degrees or by other centrality scores . Enrichment scores for pathways or other gene sets can be calculated based on the ranks and clusters of genes from these pathways. GSEA can also be applied to the evaluation of HIV/pathway associations, but genes must be ranked by their relatedness with HIV first. The selection of ranking criteria would impact the results of enrichment analysis.
In this work, we explored links between HIV infection and other human pathways of disease through several approaches: investigating the overlap of human genes involved in AIDS and other pathways, examining recovered human-HIV interactions in other pathways, studying co-expression profiles, and identifying common interaction partners in a human PPI network. All these approaches were undertaken with human genes associated with HIV and genes involved in pathways of disease. Two hundred twenty (220) human pathways involved in disease from the Kyoto Encyclopedia of Genes and Genomes (KEGG) were evaluated and statistically compared with HIV host factors. Many tests found significant associations between gene expression and HIV, and all test scores were transformed into ranks. Rank combination of these results led to a final ranking of HIV-associated pathways that provided insight into AIDS comorbidities, their underlying molecular mechanisms, and novel potential treatment strategies. Data fusion or the combination of multiple sources of information are techniques that have been applied to prioritize genes  or drug candidates . However, the application of these concepts to pathways is less common. To the best of our knowledge, this is the first study to combine the rankings of pathways through different approaches.
The HIV host factors identified among different studies are diverse. Figure 1 illustrates a Venn diagram of host factors identified from three systematic screening studies – and from HIV-human protein interactions reported in the literature . Data from several sources can be merged with either set union or intersection operations. For the current study, the intersection approach was taken. As genes from our four sources were not balanced in terms of representation, the union of these data would make the results severely biased toward the largest set (HIV Interaction Database, 1,431 proteins). However, only one gene, RELA (a component of NF-κB), was consistently identified by all four sources. Therefore, genes identified by at least three sources were included for analysis, and twelve (12) host factors met this criterion (Table 1). These host factors were defined as a ‘core set’ for subsequent analysis in this work, and were referred to as ‘host factors.’ The degrees (numbers of interactions) of these genes in HIV-human and human-human protein-protein interactions and their respective ranks are also illustrated. Most of these host factors were not ranked highly. The human protein that interacted with the most HIV proteins was the gene product of MAPK1 (mitogen-activated protein kinase 1), whereas the human protein that interacted with the most human proteins was UBC (ubiquitin C). However, both proteins were not identified by the three systematic screenings as HIV host factors.
Previous analysis of protein-protein interactions between human proteins and various viruses has shown that many pathogenic viruses interact with ‘hubs’ (high degree nodes) in the human interaction network –. However, ranking host factors by their degrees did not reflect this property. Among the 12 host factors studied, only two (RELA and AKT1, ranked 36.5 and 35, respectively) were ranked within the top 100 of 11,030 human proteins with current interaction data available. As for HIV-human interactions, only CD4 was targeted by multiple HIV proteins, and CD4 was ranked 6.5 among 1,431 human proteins with HIV-human interaction data available.
To understand the involvement of HIV host factors in biological processes, Gene Ontology (GO) annotations (biological processes) were compiled for host factors and compared to those of the entire human genome. For HIV host factors, ‘multi-organism process (GO:0051704)’, ‘immune system process (GO:0002376)’, ‘viral reproduction (GO:0016032)’, ‘response to stimulus (GO:0050896)’, and ‘biological regulation (GO:0065007)’ were significantly enriched (all with p-values<1×10−5, Figure 2). The definition of a ‘multi-organism process’ in Gene Ontology was: ‘Any process in which an organism has an effect on another organism of the same or different species (http://amigo.geneontology.org/cgi-bin/amigo/term_details?term=GO:0051704).’ Therefore, genes targeted by HIV are likely to be those involved in human-pathogen interactions. The enrichment of ‘immune system process’, ‘viral reproduction’ and ‘biological regulation’ is consistent with the behaviors of HIV and the consequences of HIV infection. The enrichment of ‘response to stimulus’ reflects the behaviors of cells in response to the binding or detection of the virus. These results are consistent with what is currently known about the virus, which includes its modulation of the immune system and its interference with cellular processes.
There are 220 human pathways available in KEGG. Among these, 86 are metabolic pathways and the others belong to signaling pathways or pathways of disease. None of the metabolic pathways ranks in the top 10 by all four rankings (Supplementary Table S1). Almost all of the metabolic pathways are ranked in the bottom half of the list, with the overall pathway (hsa01100: Metabolic Pathway) ranked last. This suggests that HIV host factors are not greatly involved in metabolic processes, which is consistent with our GO enrichment/depletion analysis (Supplementary Table S2). The association between each pathway and a set of HIV host factors was evaluated using several approaches. Pathways were then ranked by statistical tests in comparison with random pathways. The nature of each approach led to different rankings for these pathways. Six pathways were ranked in the top 10 in at least three rankings. These consensus pathways include ‘Pancreatic cancer (hsa05212)’, ‘Small cell lung cancer (hsa05222)’, ‘Acute myeloid leukemia (hsa05221)’, ‘Adipocytokine signaling pathway (hsa04920)’, ‘B cell receptor signaling pathway (hsa04662)’, and ‘T cell receptor signaling pathway (hsa04660)’ (Supplementary Table S1).
To further explore the consensus pathways identified by the four approaches to analysis, a data fusion method was applied. The correlations among different rankings were calculated and are listed in Table 2. Two approaches were highly correlated, namely ‘Common Genes’ and ‘Recovered Interactions.’ The other correlations were less obvious, suggesting that these approaches yielded diverse results. In principle, rank combination of diversified results leads to better rankings , . Based on these rank correlations, the ranks resulting from the four analytical approaches were combined as illustrated in Figure 3. The two most highly correlated rankings were combined first, as otherwise they would weigh too heavily when combined with the other rankings. The resulting three rankings were then combined again, resulting in the final ranking.
The top 10 KEGG diseases/pathways in the final ranking are listed in Table 3, along with their ranks and statistical significances as calculated by the four approaches. The six top-ranked consensus pathways were still ranked highly in the final ranking. However, four pathways were promoted by the combined ranking, namely ‘Chronic myeloid leukemia (hsa05220)’, ‘Toll-like receptor signaling pathway (hsa04620)’, ‘Chemokine signaling pathway (hsa04062)’, and ‘Apoptosis (hsa04210)’.
HIV particles must be granted entry into cells for successful infection and replication. It is thus understandable that ‘Chemokine signaling pathway’ was one of the top 10 pathways associated with HIV host factors. The glycoproteins gp160, gp120, and gp41 of HIV bind with CD4 and CXCR4/CCR5 on host cells before gaining entry into T cells. This binding triggers various signals throughout the cell, affecting the survival and migration of cells.
Three other pathways were involved in sensing and responding to viral infections, including ‘Toll-like receptor (TLR) signaling pathway’, ‘T-cell receptor (TCR) signaling pathway’, and ‘B-cell receptor (BCR) signaling pathway’. Activation of these pathways leads to immune responses including antigen processing and presentation, immunoglobulin production, and interferon-mediated antiviral effects. In some cases, activation of these pathways may also lead to autoimmunity.
Other gene expression-based studies also identified pathways associated with HIV infection , . Our findings were consistent in identifying pathways identified in these studies, including ‘Apoptosis Pathway’, ‘Cytokine Responses’, and ‘Toll-like Receptor Pathway’ .
The cancers identified in this work were not HIV/AIDS-defining cancers and were not known to have been caused by infectious agents. However, various population-based studies have shown that the risks of contracting many of these cancers are elevated in people with HIV/AIDS. An epidemiological study in France showed that the incidence of acute myeloid leukemia (AML) in HIV/AIDS patients was two-fold higher than that of the general population . One study in Germany suggested that long-term immune suppression increased AML risk . The clinical evidence for associations between chronic myeloid leukemia (CML) and HIV/AIDS is less clear, though some studies have suggested that HIV infections and highly active anti-retroviral therapy (HAART) may increase the risk of CML . Two studies in the United States and one in Denmark showed that the incidence of lung cancer increases in HIV-infected individuals  and that HIV infection is associated with an increased risk of lung cancer , . Two studies in France  and Italy  also found that pancreatic cancer deaths were significantly higher in populations with HIV/AIDS.
The association between HIV and the ‘adipocytokine signaling pathway’ was less clear. However, HIV protease inhibitors and other anti-retroviral therapies have been shown to alter human adipocyte differentiation and metabolism , . The underlying mechanism for this lipodystrophy might be due to mitochondrial toxicity and insulin resistance . This association was noted in an RNAi systemic screening study .
Using a set of stringent and conserved host factors, it has been found that HIV does not always target ‘hubs’ or high-degree nodes in the human interactome. High-throughput screening of host-pathogen interactions may lead to interactions with already promiscuous proteins. Additionally, ‘hubs’ in a network are not necessarily involved in specific processes. Combining data from multiple sources reduced the number of false positives. Associations between a reliable ‘core set’ of HIV host factors and pathways or diseases may be more significant and specific, and reveal insights into the underlying molecular mechanisms of pathogenesis and comorbidities.
In conventional pathway enrichment methods (GSEA) all genes (host factors and genes in the human genome) must be ranked using a pre-specified criterion. Usually gene expression profiles of a certain phenotype (such as HIV infection) would be used. However, using this method, multiple factors or conditions cannot be considered together. Other than gene expression, the weight of evidence (number of independent studies reporting the gene being linked to the disease or condition) and degrees or centralities in protein-protein interaction networks could also be employed as ranking criteria. However, most of these criteria are unable to assign scores to all human genes, and would impact the calculations of enrichment scores and the ranking of pathways. Unlike the GSEA method, our method only requires a set of host factors. Associations between HIV and pathways are dependent on the set of HIV host factors. This is advantageous in terms of the computational complexity as the remaining genes in the human genome can be omitted from further study.
In this work, various cancer pathways were shown to be significantly associated with HIV. This observation is consistent with several studies investigating cancer risks in HIV/AIDS populations , , . Why does HIV associate with diverse types of cancers? HIV is known to integrate its genetic materials into the host genome, which could be a cause of HIV-defining carcinomas. The random sites of integration of HIV might corrupt the expression of tumor-suppresser genes and alter the behaviors of cells. For other non-HIV-defining cancers, it is recognized that apoptosis (the killing of damaged cells)  and senescence (the inactivation of damaged cells)  play critical roles in tumorigenesis.
One concern over the associations revealed in this work is whether highly ranked pathways were simply those with more genes, as larger pathways may include more host factors by chance. The KEGG database contains various types of pathways, including ‘Metabolism’, ‘Genetic Information Processing’, ‘Environmental Information Processing’, ‘Cellular Processes’, ‘Organismal Systems’, and ‘Human Diseases’ . Whether certain types of pathways would cluster at the top of the ranking may cause concern for the validity of the ranking results. To address these issues, the numbers of genes in pathways were plotted against the ranks of those pathways (Figure 4). The resulting figure illustrates that ranks are not correlated with the numbers of genes in pathways. Other than ‘Metabolism’, which tends to rank low, most pathways do not exhibit obvious trends of clustering.
Many of the host factors studied were significantly involved in the apoptosis pathway, notably AKT1 and RELA (part of NF-κB). Apoptosis is a mechanism used by infected cells to control the spread of pathogens. Interactions between the HIV Tat protein and AKT1 and RELA inhibit apoptosis, and lead to the survival and proliferation of cells , . Activation of NF-κB in turn activates a number of survival genes. This strategy might help HIV to spread to other cells. The activation of survival genes might also inadvertently promote the growth and proliferation of cancer cells. Several cancer pathways highlighted in this work shared similar molecular machinery.
The pancreatic cancer pathway was ranked first in the final ranking. There has been little data reported on the association between HIV and pancreatic cancer , , which might be due to the low prevalence of pancreatic cancer in the general population and its resulting difficulty of study. HIV host factors involved in the pancreatic cancer pathway (hsa05212) are highlighted (Figure 5). Many of these genes play important roles in a central pathway (the EGF/EGFR/JAK1/AKT/NF-κB axis) that might lead to the survival and proliferation of cancer cells, as noted above. Additionally, highly active anti-retroviral treatments (HAART) may also negatively affect the pancreas . The cause of the increased incidence of pancreatic cancers in HIV/AIDS populations ,  is not clear; it is speculated that the introduction of HAART significantly prolonged the life-span of HIV/AIDS patients, which might contribute to increases in tumor-associated deaths .
To further elucidate the interactions between host factors and pancreatic cancers, 80 mutated genes implicated in pancreatic cancers were retrieved from a systematic screening survey . A network of interactions among HIV proteins, host factors, and mutated genes in pancreatic cancers was constructed (Figure 6). The resulting network illustrated the fact that HIV host factors do not interact with mutated pancreatic genes directly; instead, a set of ‘proxies’ or ‘hubs’ are connected with both sets of genes. Interactions from the HIV-human interaction database revealed that HIV proteins share more interactions with host factors and these ‘hubs’, and fewer interactions with genes mutated in pancreatic cancer. At first glance, these results might suggest that the association between HIV infection and pancreatic cancer arises from the ‘common interaction partner’ method used in this work. However, in the four approaches used to study these data, the pancreatic cancer pathway ranked 1st, 6th, 8th, and 1st, respectively, and these associations were all statistically significant (Table 3). Thus, the association was not solely determined by indirect human protein-protein interactions. The existence of ‘proxy’ genes in the interaction network suggests that HIV infections and pancreatic mutations might lead to common outcomes, notably the activation of anti-apoptotic and pro-survival signaling pathways.
Chronic immune suppression was shown to increase the incidences of various cancers , . HIV infection depletes CD4+ T-cells and macrophages, imposing a great impact on immune system functions. Recent studies revealed that CD4+ T-cells and macrophages are required in the clearance of senescent cells, which is critical to the prevention and regression of cancers . Without functioning immune systems and these immune cells, senescent cells promote tumor growth and metastasis, though the underlying mechanism for this promotion remains to be elucidated .
Notably, several anti-retroviral agents were shown to have anti-tumor activities, and were used to treat various types of cancers . Many HIV protease inhibitors also exhibited various degrees of kinase inhibition activity. For example, saquinavir, ritonavir, nelfinavir, and amprenavir were all able to inhibit phosphor-Akt (AKT1 was one of the host factors studied) and interfered with various signaling pathways. Among these protease inhibitors, nelfinavir has the most potent anti-cancer activity and was tested in clinical trials against pancreatic cancer . Computational modeling and screening of human kinases revealed that nelfinavir inhibited multiple kinases, and its potent anti-tumor activity might come from this combined effect . However, the tumor suppressor protein p21 (CDKN1A) was shown to confer HIV-1 resistance . This and other studies suggest that anti-tumor drugs, specifically cyclin-dependent kinase (CDK) inhibitors, might serve as novel HIV/AIDS treatments , .
This work used a combined approach to identify associations between one specific pathogen (HIV) and human pathways. Various strategies are possible approaches to refining our method, such as comparisons of score combination and rank combination , and the use of a rank-score plot to identify the diversity of rankings and further improve combination results . The identification of several cancer pathways associated with HIV was consistent with epidemiological reports of comorbidities and increased cancer risks in the HIV/AIDS population. The involvements of host factors in various cancer-related pathways also suggested the existence of common drugs or treatment options, as exemplified by HIV protease inhibitors and other anti-retroviral agents , and CDK inhibitors , . Further investigations into the targets of anti-tumor drugs and their relationships with HIV host factors might reveal insights into novel treatment strategies for both HIV infection and cancers.
HIV host factors were collected from the Human, HIV-1 Interaction Database  and several systemic screening studies. Overall, 1998 genes were identified and most (1431) were contributed by the HIV Interaction Database. Among these host factors, twelve (12) were reported by more than three studies and have been used as the set to be evaluated against the KEGG pathways.
Human, HIV-1 protein interactions were retrieved from the NCBI HIV-1, Human Protein Interaction Database . Gene Ontology annotations of these human proteins were retrieved from the NCBI GeneRIF database (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz). GO annotations have been assigned to GO terms one level below “Biological Process (GO:0008150)” using the “is_a” relationship in the Gene Ontology Database (revision: 1.2343, date: 24:10:2011). There were 24 terms in this level. For each term, the statistical significances of the proportional difference between the human genome and the set of HIV host factors were evaluated using a 2-sample proportion test.
Human protein-protein interaction data were retrieved from the NCBI Interactions database (ftp://ftp.ncbi.nlm.nih.gov/gene/GeneRIF/, retrieved on Sep, 28, 2011). Eighty (80) genes mutated in pancreatic cancer were reported  and used to construct a protein-protein interaction network among HIV, host factors, and pancreatic cancer. None of these mutated genes overlapped with the 12 host factors. Protein-protein interaction networks were constructed and visualized using Cytoscape .
KEGG pathways and the genes that participate in these pathways were retrieved from the KEGG ftp site (ftp://ftp.genome.jp/pub/kegg/pathway/) . Several files in the KEGG ftp site provide mapping between genes and pathways. Entrez Gene IDs of human targets were used to link HIV proteins to their respective KEGG pathways.
In this work, four approaches were applied to evaluate associations between HIV host factors and KEGG pathways. The rationales and details for applying these approaches are outlined here.
The first approach counts the number of genes appearing both in the set of HIV host factors and in individual pathways. If a pathway includes many HIV host factors, the association between the pathway and HIV would be highly significant. However, ranking pathways by the numbers of shared genes may be misleading. Large pathways with more genes may include more host factors by chance. Therefore, a bootstrap method was applied to estimate the distribution of shared gene numbers in random pathways, and to evaluate the statistical significance of the pathways. Pathways were ranked by their statistical significance (z-scores) and not by the numbers of common genes. The same procedure was applied to all four approaches. Details of the statistical testing procedures are described below.
Host factors may contribute in different ways to virus-human interactions. Recovered interactions do not count the numbers of common genes, but do count the numbers of virus-human interactions. For example, two pathways with the same number of genes may both include three different host factors; the three host factors in pathway A may include eight human-virus interactions, and those in pathway B may only include five interactions. In this example, the association between HIV and pathway A would be stronger.
Some genes not in the host factor set may not have available human-virus interaction data. Co-expressions of these genes and host factors may provide another means by which to identify associations. Inference of gene associations through co-expressions has been widely adopted , . Gene expression profiles from BioGPS  have been used to construct co-expressed relationships. For each gene, the expression levels across various tissue types have been used as the ‘expression profile’ of this particular gene. If more than one probe mapped to the same gene, the expression levels for these probes were averaged and assigned to the specific gene. Two genes were considered to be co-expressed if the Pearson correlation coefficient of their respective expression profiles across different tissue types was greater than 0.85.
The functions of proteins can be predicted using their connectivity information in protein-protein interaction networks , . An association between two gene sets is considered to be strong if the two sets are connected by more common interaction partners between them. Common interaction partners of two genes are gene products that interact with both of the genes, excluding the two genes themselves (self-interacting homodimers). These common interaction partners were seen as ‘proxies’ or ‘bridges’ between two gene sets, and they represented indirect interactions between the two gene sets.
For each human KEGG pathway, 1,000 random pathways with the same numbers of genes were generated. The resulting distributions were used to evaluate the statistical significances of HIV-KEGG pathway associations. The means (μ) and standard deviations (σ) of the random distributions were calculated. The z-statistics of HIV host factors compared with these random pathways were evaluated. Therefore, p-values were estimated from the z-statistics.
Genes and gene products were ranked by their degrees of interaction in human protein-protein interaction networks and human-HIV protein interaction databases. When genes or gene products had the same degree, an equal and averaged rank was assigned. For example, if three genes with N interactions were placed in 7th, 8th, and 9th places, then they each received an averaged rank of 8 (=(7+8+9)/3).
KEGG Pathways were ranked by z-statistics calculated from the 4 measures outlined above: the number of overlapped genes, the number of HIV interactions, the number of co-expressed genes, and the number of common interaction partners in the human interactome. When applicable, rank combination was applied to merge ranks into a final rank. For example, Pathway A was ranked 2nd, 14th, 5th, and 7th in 4 rankings, and Pathway B was ranked 8th, 1st, 33rd, and 2nd. After rank combination, their rank scores were 7 and 11, respectively. The rank of Pathway A therefore preceded that of Pathway B.
Rankings of KEGG Pathways by various approaches and rank combination. Detailed information for the constructions of rankings by the four approaches and rank combination are included. For each approach, the means, standard deviations, z-statistics, p-values and ranks are provided. Ranks are based on z-statistics. The 220 KEGG pathways were sorted by combined ranks.
Enrichments and depletions of Gene Ontology biological processes. Proportional differences in GO biological processes between the human genome and a set of HIV host factors were tested; z-statistics and p-values are provided. These GO processes were sorted by z-statistics. GO processes enriched in HIV host factors were placed at the top.
Competing Interests: The authors have declared that no competing interests exist.
Funding: KCC was supported by National Science Council (NSC), Taiwan, grant No. NSC-100-2221-E-320-006. CHC was supported by Tzu Chi University and National Science Council (NSC), Taiwan, grant No. NSC99-2113-M-320-001-MY2. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.