Merging several pathway data sources allows for better coverage of existing information but is difficult because of the heterogeneous nature of the member pathways. A simple solution reduces the information for each pathway to its gene members. While this loses information that could be used to cluster pathways like directionality, we see that our consolidation methods achieve consolidation both de novo and with the experimentally focused data. The resulting pathway concepts are made up of pathways that have been found to share some functional characteristics. Using various methods for consolidation enables different views of the same data. For example, PPI and WC offer users consolidation based on interactions which could provide mechanisms for understanding how pathways are interconnected within a single concept. GOSlim on the other hand, focuses on functional connections according to GO Ontology annotations. Even within a single de novo method we provide various cut-offs that can find general to very specific measured similarities between pathways within a cluster. As each method has strengths, they also have weaknesses. The EC method is limited to only enriched pathways and will miss any non-enriched pathway. Because the other methods utilize a single measurement to assess similarity, they will sometimes miss connections that exist when the measurement does not fit the data.
For most research, EC and WC will provide the best overview of pathway concepts related to experiment specific enriched pathways and all pathways. Because EC is tightly aligned with enrichment results researchers interested in only connections between enriched pathways will typically rely on this method. However because WC includes non-enriched pathways, it is possible to find connections between enriched and non-enriched pathways.
Discussed further later, we saw that WC failed for Methionine Degradation I in our random testing even when the testing was based on interactions. This was due to the lack of interactions between the few gene members. However as we also saw in Table , this failure was mitigated by the successful concept development of the same pathway using EC. While we recognize that the goal of many pathway analysis tools is to find a single results and some confidence score, Pathway Distiller seeks to facilitate exploration of various results to aid a user in finding distinct functional connections between pathways in a pathway concept.
Pathway Distiller's supplied resultant gene set (cisplatin resultant gene set) is a good example of one of the challenges that our pathway enrichment consolidation addresses. The 177 pathways with p-values ≤ 0.05 are difficult to draw conclusions from. Shown in Table , the Enrichment Consolidation (EC) method reduces this to 12 pathway concepts, each represented by the most enriched among the pathways consolidated. This enables focused attention and readability. While the EC method only groups enriched pathways, the Weighted Consolidation (WC) method groups all pathways with at least one resultant gene. Out of the 2,462 human pathways, for example, there are 1,456 pathways with one of the cisplatin resultant genes that results in 318 pathway concepts. While this is too large compared with the EC method, it offers a different pathway description precision of the resultant gene set. Because it is not limited to enriched pathways, enriched and not enriched pathways are combined. Our web-based Pathway Distiller's result tables are column sortable and therefore we are able to find the lowest p-value for each pathway concept. In the case of cisplatin resultant gene set, there are 71 pathway concepts with at least one enriched pathway, much larger than 12 pathway concepts generated by EC method however small enough to browse for interesting information.
Often it is important to incorporate specific enrichment values within a single experimental design, for instance across time points in a single cell line treated with a compound. But among diverse experiments the most enriched pathway in one might have no relationship to another experiment. A de novo method for consolidating pathways independent of specific experiments that still handles the problems of set matching and numerical identification of subset similarity allows empirical comparison and consistent results. Like the WC method, both enriched and non-enriched pathways are included, however all gene members of each pathway were used to determine clusters, not only the resultant genes. Table highlights the variation between the clustering methods and within the de novo methods with different cut-off values.
Measuring the number of interactions among a set of genes gives an estimation of their functional relatedness. The probabilities in the first column of Table indicates that the interactions are unique among most pathway concepts and not due to random sampling of genes for the pathway concept found using the EC method. Clearly, in almost all of the cases, the pathway concepts are not combined without some relationship among the pathways as measured by interactions. The second column measures the randomness of the interactions in the matching clusters of the WC method. This method of validation fails for three of the pathway concepts we tested, two in EC and one in WC. This illustrates well the advantage of not using a "one size fits all" approach to consolidation. The two pathway concepts that fail for EC (Nucleotide Metabolism and Vitamin C in the Brain Pathway) both have few pathway gene members (6 and 4 respectively) and few interactions (1 and 0 respectively). Both matching clusters were significant for the WC method because each concept contains a slightly different set of member pathways. Interestingly, Methionine Degradation I (to homocysteine) fails in WC but was significant in EC again due to a slightly different set of member pathways. It is surprising that it fails in the interaction based validation when it was clustered due to weighted interactions until it is considered that there was only a single interaction among the genes in the concept, and this one interaction was very specialized to the pathway members of the concept.
Tables , , , highlight the strengths of our consolidation methods. The tables should be read in a column-wise format; they include dividers to facilitate this. In other words, the divisions show how one clustering methods groups the pathway rows. Each division configuration represents a different way to cluster the rows of pathways.
From Table , one would expect grouping of pathways with a common relationship to p53. Gene Ontology divides the 7 pathways into two different clusters and the two different clusters have significance in terms of normal cells and those dealing with DNA damage response. Protein Interactions and the EC method group all pathways together. Gene Membership and WC method group the 7 pathways into three and four pathway concepts respectively. The WC method follows the grouping of the Gene Membership except it separates the p53-dependent and p53-independent pathways, clearly offering different precision of pathway representation.
Not only does pathway consolidation make the data more manageable, readable and publishable, it could focus attention to pathways not previously considered. For example, Ravi et al.
] determined 13 pathways related to DNA damage response, 11 corresponding human pathways are shown in Table . Similar to their findings, when we utilized the human ortholog resultant gene set for their RNAi screen, not all of the pathways were enriched, therefore simple enrichment methods would fail to draw attention to one of these pathways. Our different but complementary consolidation methods can group both enriched and non-enriched pathways together. For example, Gene membership100 finds Base excision repair, Nucleotide Excision Repair, Mismatch Repair, Homologous Recombination Repair (HRR) and DNA damage response all together in a single concept (■). Only Nucleotide Excision Repair would have been apparent in the initial pathway enrichment step with a p
-value of 1.99 × 10-4
. But, by looking at pathways also clustered with Nucleotide Excision Repair in the Consolidation Results grid, one might find connections between enriched and non-enriched pathways. Ravi et al.
created a similar hypothesis of connections between these pathways by hand. In this example, all pathways except for Glutathione Metabolism are contained in the same concept in at least one method and some are in the same concept for all methods. Because Glutathione Metabolism is never clustered with the other pathways this offers a chance for exploration of the differences between Glutathione Metabolism and the other pathways. Similarly, the possible functional connections between Notch and Tor pathways (membership100) and Base Excise Repair and Mismatch repair (PPI100) might lead to novel avenues for research.
Pathway Distiller found 53 and 105 enriched pathways when processing the MMS and TP53 case studies, respectively. Functional connections in the form of interactions, gene membership and GO Ontology between pathways will enable users to focus attention on meaningful groups instead of many individual members.
Table shows that the FOXA, FOXA1 and FOXA2/FOXA3 pathways were grouped by HPD. Figure illustrates the overlapping nature of the FOXA transcription factor network and the FOXA1 transcription factor network. HPD and Pathway Distiller's membership500 combined the three pathways into a single cluster, however PPI500 and WC did not. As indicated in the figure there is some amount of overlap between the three pathways. The FOXA1 gene is connected to the network through a single gene, AR. AR is not included in the FOXA2/FOXA3 transcription factor network. Because HPD and membership500 rely on overlap of gene sets, it is not surprising that the cluster contains all three pathways. Conversely, the interaction-based methods split the pathways into different concepts and in this case because of the functional importance of AR.
Interaction network for FOXA, FOXA1 and FOXA2/3 transcription factor networks. FOXA1 and AR interaction is noted (present/missing) to highlight if AR is in pathway or not.
In the second comparison with HPD, their similarity scores placed MAPK signaling pathway and the cancer pathways into different clusters. MAPK signaling pathway was included in the cancer related cluster for Pathway Distiller's membership500 cluster. Pathway Distiller also includes Bladder, Colorectal cancer, Pancreatic cancer, Chronic myeloid leukemia, Thyroid, Prostate, and small cell lung cancers. Dhillion et al.
] describe how MAPK features prominently in cancer. In retrospect, the membership500 cluster could have hinted at the same conclusion.