The epidermal growth factor receptor (EGFR) signalling cascade is one of the best-studied and most important signalling pathways in mammals. It regulates cell growth, survival, proliferation and differentiation. Recently, a detailed and comprehensive map of the EGFR signalling pathway has been reported (Oda et al, 2005
). As the map was built manually by experts using the literature, it can be seen as a reference representation of the pathway. Other reference maps of important signalling pathways have been reported previously (Oda and Kitano, 2006
; Calzone et al, 2008
; Herrgard et al, 2008
), providing the scientific community with comprehensive maps that can be used for modelling, which in turn will shed light on important aspects of cell signalling. However, these initiatives constitute huge efforts and, as judged by the limited number of already available maps, there is a lag between the amount of data available in public databases and the availability of such references map. Hence, we argue that public pathway databases could be used to build such reference maps of signalling pathways. Most pathway databases are also developed by experts in the field and constitute repositories of high-quality data, with the additional advantage of being already represented in machine readable formats that could, in principle, be easily and automatically retrieved, analysed and fed into modelling software tools.
We selected the EGFR pathway (Oda et al, 2005
), hereafter referred to as EGFR map
, as a ‘gold standard' to evaluate the completeness and accuracy of public pathway databases in the representation of the reactions that are part of the EGFR signalling (). We based our selection on the following reasons: (i) signalling through EGFR has been studied for more than 40 years and a lot of information about the reactions is already available (Citri and Yarden, 2006
); (ii) it has been carefully curated by experts; (iii) it constitutes an excellent example of crosstalk between different signalling events, thus allowing evaluation of the coverage of crosstalks in the public databases and the ability of network analysis tools to retrieve and combine networks in a meaningful manner; (iv) the study of signalling through EGFR has important implications for understanding several cancer types and the development of new therapeutic strategies. Several computational models have been reported on different aspects of EGFR signalling (Kholodenko et al, 1999
; Schoeberl et al, 2002
; Hornberg et al, 2005
; Birtwistle et al, 2007
; Borisov et al, 2009
; Li et al, 2009
). However, it is worth mentioning that, to the best of our knowledge, no model for the whole EGFR map
has been reported till now.
Figure 1 EGFR map. EGFR map created using CellDesigner ver.2.0 (Oda et al, 2005). This map has been coloured to show the entities and reactions that are in common between the EGFR map and information found in Reactome. Red colour denotes that the entities and (more ...)
In the following paragraphs and in , and , we use the same notation of entities as in the SBML file of the EGFR map
(Oda et al, 2005
). The EGFR map
is based on more than 240 publications and contains several crosstalks between the EGFR downstream signalling and other signalling pathways. In the EGFR map
, depicted in , entities are clustered according to their cellular location and function. The functional units comprise receptor endocytosis, recycling and degradation, small GTPase signalling, MAPK cascade, PIP signalling, cell cycle, Ca2+
signalling and GPCR-mediated transactivation. Seven phenotypic outcomes of EGFR signalling are depicted: ErbB endocytosis, ErbB degradation, apoptosis, actin reorganization, cell cycle, gene transcription and mitogenesis/tumourigenesis.
Figure 2 Comparison of ERK signalling as found in the EGFR map and in Reactome. In Reactome, ERK1 or ERK2 are phosphorylated by MKK1 or MKK2, respectively. The same reaction is found in the EGFR map, but here ERK1 and ERK2 are represented as a single entity, namely (more ...)
Figure 3 Example—annotation issue. The two reactions ‘Active PLCγ hydrolyses PI4,5-P2' (REACT_12078.2) and ‘IP3 binds with the IP3 receptor, opening the Ca2+ channel' (REACT_12008.1) are connected through the entity IP3 (more ...)
To address the completeness and accuracy of pathway information available in public databases and its automatic retrieval, we tried to recover the complete EGFR map. For this purpose we queried Reactome version 26 with the term ‘EGFR' and its UniProt identifier ‘P00533' and downloaded and visualized the retrieved pathways. Reactome was chosen as it is currently the most detailed pathway repository, and utilizes a data model that accommodates different types of biochemical reactions. For visualization, we chose Cytoscape because of its user-friendly visualization capabilities and its network analysis tools. To map the entities found in Reactome to those in the EGFR map, a mapping through standard identifiers was carried out. We compared the original EGFR map with the EGFR pathway recovered from Reactome (in BioPAX format) and coloured entities and reactions according to their representation in both resources (see ). Red entities are found to be identical in the EGFR pathway in Reactome and the EGFR map. Purple connotes entities that could be recovered from Reactome but that are differently represented (for instance, if a single protein instead of a complex is described to take part in a reaction).
Only a small proportion of the original EGFR reactions could be recovered from Reactome, and most of them are directly related to signals coming from downstream EGFR signalling. Most of the reactions related to other signalling cascades are not connected with EGFR signalling in Reactome and could therefore not be recovered. Regarding the associated phenotypes, only two were found in Reactome: EGFR endocytosis and EGFR degradation. However, both mechanisms are described in a slightly different manner, in some cases even with more details than in the original EGFR map
. In a second step, we tried to extend the EGFR map
by querying Reactome with key entities found in the EGFR map
to complete signalling cascades, such as the GPCR signalling or the MAPK cascade that are missing in the EGFR pathway in Reactome. All pathways that were added are listed in . We used additional colours to depict the entities that were recovered in this extension process. Green was used for entities found in Reactome and the EGFR map
, and turquoise was used for entities that differed in their representation in both sources (the coloured EGFR map
is available in XML and pdf formats as Supplementary information
). By this extension, we were able to recover four of the five missing phenotypes: actin reorganization, apoptosis, cell cycle and the transcription of target genes. However, some reactions were still missing in the information recovered from Reactome and in some cases gaps or contradictions appear impeding an automatic integration (). In this example, reactions in which ERK1 and ERK2 participate are first separately described and later the representation switches towards a combined ERK1/2 entity.
Pathways downloaded for extending the EGFR map
Regarding the reactions that give rise to regulatory loops in the EGFR map, only some of them could be recovered from Reactome. For instance, although the reaction that involves cleavage of pro-HB-EGF by ADAMs is described, its regulation by Pyk2 and c-Src is not included and therefore this positive feedback loop is not coloured in the EGFR map. In total, three of the six negative feedback loops were detected: inhibition of EGFR by SHP1, downregulation of EGFR and phosphorylation of SOS1 by ERK1, which leads to SOS1 inhibition.
Although most of the crosstalks between signalling cascades in the EGFR signalling could be established by the extension process, a significant number were not found because the entities that link the different cascades are missing in Reactome. For example, the important crosstalk of the Ca2+ and the EGFR signalling by the effect of Ca2+ on Pyk2 activity could not be recovered, as Pyk2 is not present in Reactome. Moreover, it is worth mentioning that details about some of the reactions differ between the EGFR map and the data found in Reactome. In part, this can be explained by the fact the former is based on literature curated in 2005, and version 26 of Reactome was released in October, 2008.
The extension process was achieved by searching the database with entities representing the main signalling cascades that are known to be connected with the EGFR signalling, followed by manual identification of the reactions that connect the pathways. In principle, the process of finding the connections or crosstalks between pathways could be automated using tools available in Cytoscape or Pathway Commons (cPath). The Cytoscape merging function was evaluated for this purpose. This function compares the attributes of the nodes to automatically connect reactions from different pathways. However, when tested on the reactions in the EGFR map, several problems arose. Most of them appeared as a result of annotation issues. For instance, shows two reactions in which the two IP3 entities are differently annotated. The first IP3 entity is located in the ‘cytosol', whereas this cellular location annotation is missing for the second IP3, precluding the expected merging of the two reactions. Hence, an automatic integration is impeded and manual intervention is needed. Another factor that hampers finding connections between reactions or pathways is the use of combined entities. For instance, the already mentioned ERK1 and ERK2 proteins first represented as separate entities are later described as a combined ERK1/ERK2 entity (). This problem could be solved by considering all the annotations of the nodes while deciding whether two entities are equivalent or not. This would allow comparing nodes that represent states of entities, for instance, post-translational modified proteins or proteins annotated using cellular locations. In summary, reconstruction of crosstalks between signalling pathways is difficult by means of the automatic tools currently available. Manual intervention is required to recover all reactions involved in the pathways and their crosstalks.
The case study presented here shows that a process combining automatic retrieval and manual intervention can be used to reconstruct the EGFR map
in its main features. This shows that current pathway databases contain a lot of detailed information though in some cases this information is still incomplete and manual intervention is needed to obtain a complete and correct network representation containing different signalling pathways. This is especially critical for reactions that are part of regulatory feedback loops, as these determine the dynamic behaviour of the signalling pathway. Nevertheless, information obtained for individual reactions and even for some pathways is quite complete and would be accurate enough as a starting point for model building. A proposal for a strategy for the use of pathway data from public databases for network modelling is presented in Box 1