Dendritic cells specific pathways in DC-ATLAS
DC-ATLAS is one of the first immunological and bioinformatics integrated project which complies with the Systems Biology Graphical Notation (SBGN) [10
]. It is composed of a database holding signal transduction pathways extensively curated specifically for DCs.
Every specific gene and reaction were annotated providing information on the organism, the organism part, the cell type and the experimental details in which the evidence has been obtained. The community of curators within DC-ATLAS manually annotated the pathways providing also the most updated reference available in existing databases and literature, as well as generating experimental proofs in their own laboratories where these were lacking. The curation procedure itself is described in more detail in Additional file 1
Development of a specific data format for DC-ATLAS
To ensure that the results of the curation process would also be fully used for representation and data analysis, a DC specific data format, the Biological Connection Markup Language (BCML), was developed to represent pathways according to the specification proposed by the Systems Biology Graphical Notation (SBGN)[10
]. BCML provides a machine-readable representation of the pathways, which can be used for description, manipulation, analysis and graphical representation. BCML is a format developed using XML and defines the complete Process Description (PD) specification from SBGN, including not only the definition of the elements, but also the rules and constraints needed to assemble a network.
In addition to a full implementation of the PD specification, BCML provides a series of optional features. First of all, BCML can include additional information on the entities that compose the network: each entity can be described by a series of species specific database identifiers, e.g. Entrez Gene or Uniprot accession numbers. Furthermore, each entity or reaction can have a set of facts or "Findings" associated. "Findings" are collections of biological information that are relevant to that entity or reaction. The current specification includes support for organism, organism part (tissue), cell type, the specific biological environment in which the evidence was proven, and the type of the experiment used to gather evidence. To reduce ambiguity and promote consistency among different "findings", the schema enforces a controlled vocabulary built from current medical ontologies.
The specification of BCML is accompanied by a series of programs (BCML software suite) that enable the use and manipulation of the format both for the bioinformatician and the biologist. First of all, the software suite permits validation of pathways described using BCML, to ensure consistency and the proper enforcement of the SBGN rules. Secondly, the software can create a fully SBGN compliant graphical representations by transforming the BCML XML into other formats (GraphML) which can be then saved as images with third-party software.
The format also permits filtering of the pathway data creating a new network containing only elements with user-defined characteristics, allowing the production of tailored made pathways, allowing individualized analyses. The tools in the BCML software suite allow specific "filtering" of the pathway, taking advantage of all the information stored. For example, nodes and edges can be selected for a specific cell type or organism, permitting the construction of customized network maps to represent specific biological contexts. When a filter is applied to the pathway, elements are marked as "included", "excluded", or "affected". An element of the pathway is included or excluded in the resulting map if it matches with the selected filter criteria or not. The "affected" state is used to indicate elements that may not be present depending on the filtering; for example. in a specific cell type a complex may not form if one or more of its proteins are not present. Filtering may be used to assist data analysis and interpretation and might point to gaps in current knowledge.
The BCML format can incorporate any kind of experimental measurements that can be matched to the identifiers of an element. This allows modification of the BCML map, facilitating incorporation of high-throughput data coming from transcriptomic or proteomic experiments. The outcome will be visualized in different color on the graphical map.
Finally, BCML allows transformation of the pathways into different data formats, which may be needed for further analysis. Tools provided within the suite allow the generation of identifier (gene) lists from a BCML file, enabling their use with analysis tools such as Gene Set Enrichment Analysis (GSEA), Fisher's Exact Test. Additionally, the format can be converted to a form amenable for impact analysis through the SPIA R package. This conversion can take into account the filtering applied to the elements of the pathway, to carry out individualized analyses.
A detailed description of BCML format is available as Additional file 2
TLR pathway curation and modular structure in DC-ATLAS
At present, the human TLR pathway set in DC-ATLAS is a network organized in an ensemble of 8 pathways (TLR1-2, TLR2-6, TLR3, TLR4, TLR5, TLR7, TLR8 and TLR9), subdivided in 10 sensing modules, 32 signal transduction modules and 30 outcome modules. In contrast to what is present in existing databases, TLR7 and TLR8 were curated separately. Although their genes lie in close proximity on chromosome X and are highly homologous, recent evidences suggest they have distinct roles in DC mediated immune response [11
]. For example, despite the fact that both TLRs bind the same ligand and largely overlap in their signaling, stimulated TLR7 activates transcription factor IRF7 [14
] while IRF1 [15
] is only an effector of TLR8 mediated signaling.
Expert-guided, manual curation of the pathways has been a crucial part of the DC-ATLAS initiative, leading to a substantial "reshaping" of the existing pathways. For example, curation of TLR3 pathway led to the validation of only about 50% of the genes included in the list originally retrieved from public databases (Figure and ). Furthermore, a number of genes previously not annotated as belonging to the TLR3 pathway in publicly available databases were found to participate to the signaling cascade in DCs. Among them, especially the number of target genes has been substantially extended, including the cytokines IL-10 [16
], IL-1α [17
], the chemokines CCL3 [17
] and the CCR7 chemokine receptor [20
], the co-stimulatory molecule CD83 [16
], the transcription factor STAT4 [21
] and the enzyme INDO [19
Figure 1 Comparison of the DC-ATLAS Toll Like Receptor (TLR) 3 pathway with other pathway databases. (A) Representation of the KEGG Toll-like receptor (TLR) pathway. The TLR3 signal is highlighted in red. (B) Representation of the KEGG TLR3 pathway, displaying (more ...)
Another example demonstrating the importance of DC-ATLAS curation is exemplified by the fact that in the sensing module of TLR9 we found a new element, UNC93B1, whose involvement in signaling was demonstrated already in 2007 [23
]. All the other improvements of DC-ATLAS with respect to existing pathways fall mainly in the signal transduction and outcome modules.
A summary of all the new genes and/or connections, not previously annotated in TLR pathways, and present in DC-ATLAS is presented in Table . Since the field is rapidly evolving, when new evidence appears demonstrating that new or so far excluded interactions are operating in DCs, DC-ATLAS will be updated accordingly.
DC-ATLAS curation results: number and names of new genes present in TLR pathways of DC
To facilitate meaningful analysis of "omics" data, the pathways in DC-ATLAS are organized in a modular structure. Every signaling cascade downstream a specific receptor was divided into 3 types of modules in which the very last component of one module is also the first component of the subsequent module. The first type of module is the receptor and sensing module and comprises component(s) of the pathway directly interacting with the stimulus. The second transduction module, encompasses all components transducing the incoming signal from the sensing module downstream to the nucleus. This module generally starts with a molecule interacting with the receptor and ends with a transcription factor. The third and final module is the outcome module: it describes the end result of the signaling process. This last module begins with a transcription factor and includes target genes whose expression is altered after activation of the receptor. Complex cell functions, such as apoptosis, migration and differentiation are also described as outcomes.
According to the previous module definition, the pathways in DC-ATLAS may contain more than one of each type of modules. As an example, Figure shows the modular structure of the TLR3 pathway curated for DC-ATLAS. In this pathway, one receptor/sensing module and three transduction modules leading to the activation of three critical transcription factors, IRF3, NF-kB and AP-1 have been identified.
Figure 2 SBGN representation of the DC-ATLAS human TLR3 signaling pathway. The different modules are represented: The Receptor/Sensing module (R/S, in yellow), the different Transduction modules (T1, light grey; T2, pink; T3, light blue) and the Outcome modules (more ...)
The modules, as we defined them, have been subsequently tested using gene expression data as described in the following paragraphs. It should be emphasized that the transduction modules are not independent but are highly interconnected and partially overlapping. Furthermore, a given outcome may result from activation of more than one transduction module.
The data format we used to describe the pathway allowed us to depict interactions in the cellular organelles where they occur as well as to specifically mark genes and interactions according to the biological system (e.g., cell type and species) where they took place. Thus, we were able to create a map of the TLR3 pathway for example clearly showing which genes and interactions were described in DCs and which were not (Figure , Additional file 3
Figure S1 and Additional file 4
Figure 3 Presence or absence of specific Toll-like receptor (TLR) 3 pathway elements in different cell types according to the currently available knowledge. (A) Section of TLR3 pathway described in DCs; (B) Section described in macrophages. Grey elements are members (more ...)
Overall, these results provide strong support for the importance of curating a pathway with the final aim of defining all interactions and nodes occurring in a specific species, cell type and compartment.
DC-ATLAS is a powerful tool to dissect TLR specific contributions and to analyze time course related responses
To address the importance of the modular structure of the DC-ATLAS and its statistical approach in dissecting the contribution of TLRs, we performed a time-course transcriptional analysis of moDCs stimulated with LPS and risiquimod (R848) that respectively activate TLR4 and TLR7/8. We calculated pathway signatures for each of these datasets and subsequently clustered resulting pathways (see Methods).
By clustering pathway signatures using publically available TLR pathways, it proved virtually impossible to obtain information of individual potentially affected elements within the TLR pathway, despite clear up-regulation at the pathway level. Instead, clustering of DC-ATLAS based results readily showed a separation of different stimulatory conditions (Figure ). The total matrix used for clustering is available as Additional file 5
Figure 4 Pathway analysis on microarray data on DCs stimulated with R848 and LPS using DC-ATLAS pathways. (A) Section of clustering of PEF and score using Euclidean distance using support trees on DCs stimulated with R848 and LPS for different periods of time: (more ...)
As we expected both the TLR7/8 and TLR4 modules were affected upon specific stimulation, with R848 and LPS respectively [24
]. At early time points, analysis allowed appreciation of activation of specific signal transduction modules while at later time points, outcome modules were clearly activated and sensing modules were down-regulated or not affected, indicating a general feedback regulation in fully matured DCs. At this stage, DCs have committed to their fate and decided how to respond to a specific stimulus making some of its sensing receptors redundant. Despite the overlap between signaling from both receptors, the cluster analysis indicated how DCs stimulated for 6 hours with R848 behave similarly to cells stimulated for 3 hours with LPS, underlining a slower activation of the signaling through TLR7/8, perhaps due to their intracellular localization in the endosome. At 24 hours, when the DC maturation process is completed, the profiles of the pathway signatures are more similar between the two stimuli.
Also in time course experiments, the modular structure of DC-ATLAS allows to appreciate time-dependent changes in expression providing a more informative analysis. The TLR4-sensing module is repressed at 3 hours of LPS stimulation. After 12 hours of stimulation, the MyD88 dependent signaling module is less abundant when compared to MyD88 independent transduction modules during the earliest time points. As can be seen in Figure , after 24 hours of LPS stimulation, the outcome modules activated by AP-1 become repressed. Similarly, upon R848 stimulation, the sensing module is over-represented at early time points and switched-off later on. After 24 hours, several parts of the signal transduction module are repressed as well as the outcome module indicating a commitment of the cells or a feedback regulation. Together, these observations nicely demonstrate that, using DC-ATLAS, we can follow the signal, as a temporal series of discrete events across all the modules, from sensing to outcome trough the transduction part.
As can be seen from the analysis, in addition to a single TLR specific pathway, a number of other TLR pathways can be affected by the stimuli used. This is because several of the TLR pathways, such as the TLR4 and TLR7/8 pathways, share some elements, although this does not necessarily mean that their engagement leads to identical outcome. When analyzing the LPS dataset at 3 hours, 135 genes were found to be differentially expressed within the DC-ATLAS pathways and 47 of them belonged to the TLR4 signaling pathway (Figure ). Among these, 24 were shared with TLR7/8 pathways, while 23 elements were assigned specific for TLR4 (Figure ).
Given the modular structure and DC specific annotations of DC-ATLAS we can also evaluate individual elements involved in TLR specific signaling. For example, we mapped differentially expressed genes upon 3 hours-LPS stimulation from our data set to the TLR4 pathway (Figure and Additional file 6
Figure S3). Using this map and the output of the pathway analysis, it becomes now possible to appreciate the entire flow of the signal starting from the receptor till the final activation of the transcription of specific genes inside the nucleus. It is well established that TLR4 engagement can result in different signaling, dependent on the adaptors recruited [25
]. The signal either starts from MyD88 and the MyD88-like adapter (TIRAP), or from the TIR-domain-containing adapter-inducing interferon-beta (TRIF, also shared by TLR3) and the TRIF-related adapter molecule (TRAM). We observed that this is highly time dependent as the signal trough TRAM at 3 hours was still down-regulated (Figure ) and became up-regulated at 6 hours after stimulation (Figure and Additional file 7
These results thus illustrate that the modular structure of DC-ATLAS allows a better and more detailed understanding of TLR mediated signaling in time course experiments.
DC-ATLAS can discriminate between species-specific pathways
Currently, studies on mouse DCs outnumber those on human cells; however, comparisons between mouse and human models have been somewhat biased due to biological differences between both species [27
] as well as differences in the origin of the material used to study DCs, e.g. bone marrow derived mouse DCs (BMDC) versus monocyte derived human DCs (moDCs). The species-specific curation of the DC-ATLAS pathways allowed us to highlight the differences between mouse and human model DC signaling in response to similar stimuli (Figure ). When we perform a pathway and cluster analysis on publically available human moDC- (GSE2706, GSE4984) and mouse BMDC- (GSE15087) datasets, we could clearly identify a different profile from the mouse data when compared with the human data, even though they were both stimulated with LPS (Figure ), although we should take into account they were derived from different progenitors. The total matrix used for clustering is available as Additional file 8
Figure 5 Pathway analysis on microarray data on human or mouse DCs stimulated with TLR ligands. Dendogram of PEF cluster and score using Euclidean distance using support trees on human moDC or mouse bone marrow derived DCs (BMDCs) stimulated with LPS. The numbers (more ...)