A total of 3695 sentences reporting interactions between the six cell types used here and 38 cytokines were identified by ImmuneXpresso (). These were unified into 217 unique interactions, with an average number of 17 sentences per interaction. The resultant network is dense, with 54% of cell-cytokine pairs connected to one another directly. In comparison with other biological networks, this is a very dense network14
, in agreement with the high density value observed for the cellular immune network (nodes are cells, edges are cytokines) that Frankenstein et al.14
derived from curated databases.
Figure 1 An automatically derived network of cells and cytokines from the immunological literature. Six immune system cell subsets (white nodes) and their relationship, through 217 edges, with 38 cytokines (grey nodes). Edge color, from light to dark denotes the (more ...)
Despite the high density we observe, the theoretical density which could be reached with 217 edges in a bipartite network of this size is 95%. However, many of the cell-cytokine pairs have more than one edge type spanning between them which likely reflects either incorrect semantic parsing of the text or the multiple functionality of cytokines under different conditions or in specific cell subsets. From the 217 edges, 113 were positive, 50 were negative and 54 were of undetermined type. Examination of the number of sentences shows that the evidence for most cell-cytokine relationships is predominantly of one type (). Furthermore, there is a strong correlation (Pearson's 0.91 for cell types, 0.59 for cytokines) between the degree of a node and the number of sentences which support its interactions. This is to be expected in a literature based network as the more research is done in a field, the more separate components become associated with one another.
Figure 2 Interactions between B-cells and twelve cytokines extracted from abstracts by ImmuneXpresso. X-axis shows the number of sentences the interaction was observed in. Color denotes interaction type. White, gray and black denote positive, negative and undetermined (more ...)
To evaluate our ability to correctly capture cell-cytokine interactions automatically from abstracts, we compared our results to the relevant subset of a manually curated set of cytokine-cell interactions from the COPE database and the Online Cytokine Reference12
. Checking specifically by cell type, we verified each of our detected B-cell interactions with those available on the two websites. Overall, we had a 40% false-negative rate, but no false positives, better results than are usually observed for co-occurrence systems which search abstracts only1
. Two of the cytokines interacting with B-cells, IL-21 and RANTES, did not appear in the Online Cytokine Reference, but did appear in COPE. Similar false negative rates were observed for dendritic cells. Interestingly, more recent discoveries in cytokine-cell regulation such as the regulatory interactions of γδ-T-cells, or IL-33 did not appear in either database, showing the power of automatic annotation for keeping up to date with published literature.
Edges in the ImmuneXpresso network link cells and cytokines to one another (). In reality, these cells produce and secrete cytokines which bind to membrane receptors expressed on the surface of another cell or even the same one. We hypothesized that inter-cellular interactions between cytokines and their receptors would be evident, in cell specific gene expression data. If we could detect cytokines and receptors gene expression, than we may be able to assemble gene-expression-based inter-cellular communication networks. Further, if cytokines and their binding partner receptors were exclusively expressed between cells, we may be able to assign directionality to currently undirected ImmuneXpresso reported interactions. To test this, we assembled a compendium of cell specific gene expression signatures from publically available microarray studies (see 2.3) matching the cell types in the present version of the ImmuneXpresso lexicon, as well as identified the one or more receptors each cytokine binds (see 2.4).
We asked how many cytokines in a given cell type are exclusively expressed from their receptor and how many are expressed in the same cell. The majority of cytokines and receptors were expressed exclusively of their binding partner. For example, the 8 cytokines and 21 receptors we detect expressed in T-regulatory cells may participate in 35 different binary interactions, both within our model framework (5 immune cell subsets, no data for γδ-T-cells) and outside, but for only two (IL-13 binds IL4RA and IL-16 binds CD4) are both the receptor and the cytokine expressed by T-regulatory cells. We note that the cytokine IL-2, known to both be expressed by T-regulatory cells, and regulate them via IL2Ra (CD25), is not detected as expressed in T-regulatory cells under our criteria. Similarly, for only 7 out of 40, 10 out of 30, 6 out of 40 and 5 out of 34 for B-cells, T-helper, CTL and dendritic cells respectively, both cytokine and receptor are expressed on the same cell. Expression of both cytokine and receptor in the same cell may be indicative of auto-regulation, or of the expression of either the cytokine or its' receptor under different conditions or subsets.
The ImmuneXpresso network is bipartite, with each edge representing an interaction between cell and cytokine. Each edge in this network has been extracted from the literature and thus is supported by experimental evidence. On the other hand, an interaction between two cells must always consist of at least two edges. Unlike single edges, interpretation of a path of two or more edges as a cytokine mediated interaction between two cells, is not necessarily warranted, as it requires one cell to be a producer of the cytokine and the other to be affected by it. Here, gene expression data may come to the aid, as it allows one to identify cells expressing or producing cytokines, and those with the potential to be affected by them by expressing the corresponding cytokine receptors.
Ignoring edge types, the ImmuneXpresso network has 119 edges. In theory, these could encode for 309 possible undirected regulatory paths between the 6 cells in our system, 119 of which are auto-regulatory. Using the information obtained from the gene expression data, we can now estimate how many of the pathways theoretically derived from the ImmuneXpresso output are supported by the gene expression data, such that a cytokine and its receptor are expressed in the two communicating cells. To do so, we consolidated the gene specific cytokine-receptor information we obtained from Entrez to match the details of the lexicon ImmuneXpresso uses. For example, the information `B-cells express both IL-10 and the IL-10 receptor IL-10Ra and IL-10Rb' is simplified in the consolidated form to `B-cells express the IL-10 cytokine and a receptor to which it can bind'.
Filtering the 309 ImmuneXpresso paths by requiring one of the cells to express a cytokine while the other expresses its receptor drops the number of interactions to 158 of which 30 are auto-regulatory (same cell expresses both the cytokine and the receptor). The remaining 151 non-functional paths are either a byproduct of the network representation, or appear as such due to the threshold we set as to which genes should be considered expressed (see 2.2). Furthermore, as we observed, many of the cytokines and receptors are exclusively expressed on one of the two cells. Therefore, unlike the directionless interactions ImmuneXpresso currently reports, many of the cytokine mediated cell-cell interactions we infer from the gene expression data are directional. Of the 158 ImmuneXpresso paths supported by the gene expression data, we can assign directionality to 76. Last, we can ask the reverse question, namely how many of the possible cytokine mediated, cell-cell interactions appear in the gene expression data, and cannot be traced to any path in ImmuneXpresso. We find 27 such paths, 21 of which we could not find in the manually curated cytokine databases11,12
. Each represents a hypothetical cytokine mediated cell-cell interaction that can be tested experimentally.
The utility of a knowledgebase in machine computable format stands out when conducting high throughput discovery driven experiments. In such experiments, researches rarely have expert knowledge in all of the variables being assayed and the number of results from experiments is often very high. Thus, it may be difficult to prioritize findings and link them to one another or to previous discoveries, to establish a comprehensive perspective. As a proof of principle, we analyzed serum cytokine and cell subset frequency data measured at the Stanford Human Immune Monitoring Core for 29 individuals, males and females of varying ages. Of the 41 detected interactions (see 2.5), 18 were between a cell and a cytokine covered in the present ImmuneXpresso lexicon version (see Data sources 2.1). Remarkably, ImmuneXpresso could verify each of those 18 and match it with a reported interaction in the literature. For example, an interaction between IL-15 and CTLs, which was deduced by our algorithm by the positive correlation observed between IL-15 and CTLs, was detected 9 times in abstract sentences by ImmuneXpresso, from such sentences as `These findings identify a novel CTL costimulatory pathway regulated by IL-15 and suggest that tissues can fine-tune the activation of effector T cells based on the presence or absence of stress and inflammation'15
. Comparison of a second dataset, in which interactions of 13 cytokines were experimentally identified with spleen derived CD4+ T-helper cells, showed that ImmuneXpresso could validate 10 out of the 13 observed interactions. Manual searches for the other three interactions was able to confirm one additional interaction not captured by ImmuneXpresso. Each such interaction suggests a testable hypothesis which requires considerable time and resources to test. As datasets grow larger, machine identification and prioritization of novel findings to follow up on is key.