|Home | About | Journals | Submit | Contact Us | Français|
Unified Human Interactome (UniHI) (http://www.unihi.org) is a database for retrieval, analysis and visualization of human molecular interaction networks. Its primary aim is to provide a comprehensive and easy-to-use platform for network-based investigations to a wide community of researchers in biology and medicine. Here, we describe a major update (version 7) of the database previously featured in NAR Database Issue. UniHI 7 currently includes almost 350 000 molecular interactions between genes, proteins and drugs, as well as numerous other types of data such as gene expression and functional annotation. Multiple options for interactive filtering and highlighting of proteins can be employed to obtain more reliable and specific network structures. Expression and other genomic data can be uploaded by the user to examine local network structures. Additional built-in tools enable ready identification of known drug targets, as well as of biological processes, phenotypes and pathways enriched with network proteins. A distinctive feature of UniHI 7 is its user-friendly interface designed to be utilized in an intuitive manner, enabling researchers less acquainted with network analysis to perform state-of-the-art network-based investigations.
The study of molecular systems and networks is now a major field in biology and medicine. The goals of network-based investigations range from prioritization of candidate genes to determination of complex molecular mechanisms underlying a disease or a biological process (1,2). An essential prerequisite for these investigations is the availability of resources for molecular interactions in model organisms and humans. To address this need, various databases have been established in recent years (3). Especially for protein–protein interactions in humans, many initiatives and research groups have contributed large sets of data derived from the literature, high-throughput methods or computational prediction (4–15). In parallel, a wide range of dedicated programs for network analyses have also been developed (16–18).
However, it is a common experience that current resources and tools pose a considerable challenge to users, especially to researchers less acquainted with concepts in network biology. Frequently, users have to download, map, compile and integrate distinct data types to conduct network-based investigations. These activities require extensive knowledge of data processing and management. Thus, a salient ‘bottleneck’ exists for many interested researchers between the wealth of available molecular interaction data and their utilization.
This observation motivated us to develop a new version of the Unified Human Interactome (UniHI) database for the retrieval, analysis and visualization of human molecular interaction networks: UniHI 7. We provide a platform that enables (i) retrieval of an integrated set of interactions from the major resources, (ii) intuitive use of tools for network-based investigations and (iii) easy utilization of complementary data and information for analysis, evaluation and visualization of retrieved networks.
UniHI 7 integrates ~350 000 molecular interactions for more than 30 000 human proteins. It is based on a complete re-implementation of previous versions of UniHI, with widely extended scope and functionality. Besides protein–protein interactions from 12 different resources [including HPRD (4), BioGrid (5), IntAct (6), DIP (7), BIND (8) and Reactome (9) databases; as well as four interaction maps produced by computational predictions (10–13) and two high-throughput yeast-2-hybrid (Y2H) screens (14,15)], UniHI 7 also comprises curated transcriptional regulatory interactions from three complementary databases TRANSFAC (19), miRTarBase (20) and HTRIdb (21). In addition to these interactions, we also integrated drug target information from DrugBank (22) that can be mapped on the interaction network. Detailed description regarding the incorporated resources can be found on the UniHI 7 web-page and in the Supplementary Materials (Supplementary Table S1). Whereas former UniHI versions can primarily be regarded as integrated databases for human protein–protein interactions (23,24), additional strengths of UniHI 7 lie not only in the integration of regulatory interactions but also in its interactive analysis and visualization tools for molecular networks. Although there are other databases with integrated molecular interaction data (25–34), UniHI provides a distinct and unique set of features ranging from simple filtering options to advanced network analysis tools (Supplementary Materials and Supplementary Table S2). The main application of UniHI 7 is the retrieval and examination of small to medium-sized local networks. It is ideally suited for researchers, who want to explore the molecular context of a single protein or a select set of related proteins using a network-orientated approach.
The UniHI 7 database can be queried for molecular interactions of single or multiple human proteins. Various identifiers such as gene symbol, Entrez Gene, Uniprot and Ensembl IDs can serve as input. It is also possible to input gene and protein identifiers from the model organisms yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elgans), fly (Drosophila melanogaster) or mouse (Mus musculus), which will be automatically mapped to human orthologs. This feature is convenient for researchers, who work with these major model organisms and want to interrogate related human molecular networks. In total, 2977 yeast, 6922 worm, 7998 fly and 15 694 mouse genes were mapped to human orthologs included in UniHI 7 using information from the HGNC database (35). As identifiers for model organisms, Entrez Gene IDs can be generally used, as well as systematic names for yeast, WormBase IDs or gene symbols for worm, FlyBase IDs or gene symbols for fly and MGI identifiers or gene symbols for mouse. The list of identifiers and any other data uploaded by the user are only stored during the active session and are accessible only to the user.
Whereas in previous UniHI versions, queried proteins, retrieved interactions and resulting networks were presented in sequential order, UniHI 7 displays now on four web-pages in parallel: ‘Proteins’, ‘Physical Interactions’, ‘Regulatory Interactions’ and ‘Network’. This display scheme enables users to readily switch between the different types of displayed information (Figure 1). The ‘Proteins’ page provides the list of proteins in the UniHI 7, matching the input and the names of the databases or resources in which each protein is included. In addition, hyperlinks to Entrez Gene, Uniprot, RefSeq, OMIM IDs and KEGG (if available) are given. On the ‘Physical Interactions’ and ‘Regulatory Interactions’ pages, the set of detected interactions is shown with various additional information regarding their source, evidence, type and functional annotation. A crucial feature is that interactions can easily be traced back to the original resources and publications, and thus can be critically assessed by users. In addition, all the interactions displayed on these two pages can be downloaded as simple tables, which can used as input for other computational tools.
The retrieved interactions are displayed as a network on the ‘Network’ page. For network visualization, we utilized the recently developed Cytoscape Web, which is a client-side application implemented in Flex/ActionScript and modeled after the popular Cytoscape software (36). To prevent that the visualizaton tool becomes unresponsive, certain automatic layout and filtering procedures are implemented for larger networks (Supplementary Materials).
Information about proteins and interactions can be interactively explored in the network graphics, avoiding cumbersome comparison with the textual output. For instance, clicking on the network nodes provides information about the corresponding protein and links to other resources such as GeneCards (37) and GeneMania (38) for follow-up study. The displayed network can be exported as simple tab-delimited text, and as image, either as a PNG or PDF file.
Molecular networks are inherently difficult to analyze and interpret. In fact, the sheer number of retrieved interactions for well-studied proteins (including many kinases and receptors) can be overwhelming for users (Figure 2a). To help users with these challenges, we have implemented several tools for filtering and inspecting networks, as well as for mapping and utilizing complementary data and information. The application of these tools can be customized to different research objectives and can assist in the elucidation of network structure and prioritizing candidate proteins for follow-up studies.
Filtering of interactions can be carried out on resource, published evidence (i.e. number of PubMed references), scale of experiment (i.e. small-scale or large-scale), type of derivation (i.e. literature, computational prediction or Y2H screens), connectivity (i.e. direct or indirect) and interaction (i.e. binary or complex) (Figure 2b). These filtering options can be tailored to produce more reliable and specific networks, e.g. include only interactions reported in multiple publications.
UniHI 7 also stores gene expression in 19 different human tissue types derived from the Symatlas (40,41). Users can apply this data to highlight or exclude proteins (based on a chosen threshold level) to derive tissue-specific networks. In addition to using gene expression data stored in UniHI 7, users can upload their own expression data to filter and examine human molecular interaction networks. This feature can be applied to detect network proteins, which have distinct expression patterns related to physiological processes or diseases. Two different types of expression data can be used: absolute gene expression, i.e. positive values for transcript levels such as detected by Affymetrix GeneChips or RNA-Seq and differential gene expression, i.e. changes in expression derived from two-color arrays or by subtraction of absolute expression measurements. Thresholds for differential expression data can be set as maximum P-values or minimum fold changes (Figure 2c).
In addition, gene lists, for example, derived from RNAi screens or high-throughput assays, can be uploaded and utilized for annotation and filtering of interaction networks. Together with its capacity to overlay expression data, this option makes UniHI 7 an efficient platform for network-based analyses in the ‘post-genomic’ era.
Small molecules (drugs) can influence activity of single proteins, alter pathogenic mechanisms and are of crucial importance in numerous therapeutic interventions. To facilitate identification of known drug targets in the retrieved networks, UniHI 7 provides relevant highlighting and filtering options (Figure 2b) as well as information about the drugs and their mechanisms of action. From the DrugBank database, information for 4203 drugs targeting 2139 annotated proteins were altogether, imported into UniHI 7 (22) (Supplementary Materials).
The functional relevance of networks is inherently difficult to assess. Hence, we have implemented a user-friendly integrated tool, which carries out enrichment analyses for molecular functions, biological processes, cellular location [as defined by Gene Ontology (42)], protein families [as defined by Pfam (43)] and pathways [as defined by KEGG (44)] of network proteins (Figure 2d). The significance of overrepresentation of network proteins in a Gene Ontology category, Pfam protein family or KEGG pathway is calculated using the hypergeometric test (which is equivalent to the one-tailed Fisher’s exact test) with the terms from human genome as background distribution. For significant terms (i.e. GO categories, Pfam families or KEGG pathways), the number of included network proteins, the P-value and the false discovery rate for enrichment are displayed in a table. The associated proteins can easily be highlighted in the network graphics by clicking on the corresponding term (Figure 2e and f).
Finally, phenotype information can be assessed for network proteins in the new version of UniHI. For this purpose, we have integrated gene–phenotype associations, curated in Mouse Genome Database (45) and mapped to their human orthologs, for several major phenotypes such as cardiovascular system or embryogenesis. In addition, we have collected genes linked to aging in humans from the GenAge database (46), genes associated with cancer from the Cancer Gene Census catalogue (47) and genes linked to human diseases from the OMIM database (48) (Supplementary Materials). Similarly to the type of analysis describe earlier, the UniHI user can assess whether phenotypic associations are overrepresented among network proteins, and highlight the associated proteins within the network. A help page with detailed description of the different tools and sample outputs for typical analyses is available on the UniHI 7 webpage.
The architecture of UniHI 7 comprises a database and an application layer. The database layer is implemented using MySQL, an open source SQL relational database management system. The application layer is implemented using a J2EE architecture including, e.g. JDBC to connect to the back-end database, DAO for interacting with the database and accessing data and JavaServerPages to generate web pages. Data retrieval from the database is performed using the Hibernate library. The communication between client and the application layer is through a Tomcat server. To perform enrichment analyses, UniHI 7 connects via Rserve (http://www.rforge.net/Rserve/) to the R/Bioconductor software (17). Matching of gene and protein identifiers was carried out using information from HGNC (35) and applying the g:Convert web tool (49). UniHI 7 performs best with small- to medium-sized networks (with up to several hundred interactions). For larger networks, the visualization and analysis becomes increasingly time consuming.
UniHI 7 is intended to serve as a bridge between resources for interaction data and more advanced software. It provides a user-friendly web-based platform to study networks underlying molecular mechanisms in human health and disease. Customization allows users to: (i) adjust the set of included interactions, (ii) overlay retrieved sub-networks with other types of data, (iii) inspect networks for relevant genes, (iv) determine potential network functions and (v) associations with phenotypes. We hope that UniHI 7 will considerably facilitate network-orientated investigations for many researchers, especially for those who are new to this field.
Supplementary Data are available at NAR Online.
Funding for the presented work was provided by Portuguese Fundação para a Ciência e a Tecnologia (FCT) [PTDC/BIA-GEN/116519/2010 and IBB/CBME, LA] and CHDI Foundation [A-2666]. R.K. is recipient of a FCT scholarship [SFRH/BPD/70718/2010]. D.A. was supported by the FCT [PTDC/BIA-BCM/117975/2010]. Funding for open access charge: Portuguese Fundação para a Ciência e a Tecnologia [PTDC/BIA-GEN/116519/2010].
Conflict of interest statement. None declared.
We would like to thank Christian Lopes and Gary Bader for their help with the integration of Cytoscape Web into UniHI 7, Paulo Martel for his support in the configuration of the server, Trudi Semeniuk for critical reading of the manuscript and both referees for their constructive comments.