Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Cell Host Microbe. Author manuscript; available in PMC 2011 April 12.
Published in final edited form as:
PMCID: PMC3074406

Taming Data


A challenge in systems-level investigations of the immune response is the principled integration of disparate data sets for constructing predictive models. InnateDB (Lynn et al., 2008;, a publicly available, manually curated database of experimentally verified molecular interactions and pathways involved in innate immunity, is a powerful new resource that facilitates such integrative systems-level analyses.

The innate immune system is governed by complex networks of molecular interactions, which when perturbed can result in dysfunction such as inflammatory disease. Systems biology approaches, which incorporate powerful high-throughput measurement technologies with computational analysis and mathematical modeling, are allowing us to study such molecular systems on multiple levels of regulation (Gilchrist et al., 2006; Oda and Kitano, 2006). These include protein-DNA interactions in transcriptional regulatory networks, microRNA regulatory networks, epigenetic processes involving DNA methylation and chromatin remodeling, protein-protein interactions in signaling networks, as well as autocrine and paracrine communication networks between cells. The construction of mathematical models, inferred from such global data sets measured dynamically over time and integrated with information available in public databases, will make it possible to predict cellular behavior under new conditions or environmental perturbations and, ultimately, will enable the development of systematic intervention strategies for reversing pathophysiological effects.

One of the most significant challenges in systems-level investigations of the immune response is the principled integration of disparate types of data for constructing such predictive models. The availability of information about molecular interactions and pathways is critical for addressing this challenge as it can (1) significantly reduce the search spaces for data-driven network reconstruction since the inferred dynamical rules of interaction can be constrained or biased toward the known network structures; (2) mitigate the sample size requirements for network inference since prior information about the connectivity is available; and (3) facilitate the interpretation of complex data sets, which can be visualized in the context of such pathways. In a recent issue of Molecular Systems Biology, Lynn et al. (2008) describe a powerful new resource that facilitates such integrative systems-level analyses: InnateDB ( is a publicly available, manually curated database of experimentally verified human and mouse molecular interactions and pathways involved in innate immunity.

InnateDB contains more than 3500 curated innate immunity-relevant interactions involving around 1000 genes. In addition, to improve coverage of the innate immunity interactome, over 100,000 known interactions and approximately 2500 pathways from major public databases are also included and intelligently grouped to avoid redundancy between databases. The manually curated data are richly annotated with a variety of useful details, including supporting publications, interaction detection methods, interaction types, information about cells or tissues, and other contextual annotations. Where available, such interaction annotations from other databases were also imported into InnateDB. Importantly, all interaction data in InnateDB are available for download in a standard format, allowing computational biologists to analyze the interaction data using the tools of their choice and integrate such analyses with other measurement data. Additionally, detailed orthology predictions have been integrated into InnateDB in order to facilitate comparative network analysis of human, mouse, and cow.

InnateDB interacts with the user using a flexible web-based interface that offers searching by molecules, interactions, or pathways and provides lists of synonyms and external identifiers for genes and proteins. More complex queries are possible with Boolean operators, and molecular interactions can be searched by interaction type, cell or tissue type, and other annotations. The user can also search for genes that belong to a particular pathway along with all the genes that interact with them. Such searches can then be visualized using the Cerebral plug-in (Barsky et al., 2007) for Cytoscape (Shannon et al., 2003). This useful tool, also created by the InnateDB team, can visualize a network (e.g., the result of a pathway search), using a localization-based layout whereby the nodes of the network are arranged according to their subcellular localization, such as the nucleus, cytoplasm, cell surface, and so on. Furthermore, if expression data are also provided, they can be superimposed on the network by coloring the nodes in accordance with the expression levels of the corresponding genes or proteins.

The creators of InnateDB have clearly recognized the need for a flexible integrated analysis framework in order to achieve the promise of systems approaches to understanding immunity. Indeed, in addition to the highly useful Cerebral plug-in, which contains several standard analysis tools, such as k-means clustering, InnateDB also offers basic pathway or gene ontology enrichment analysis tools for identifying particular pathways or ontology terms that are enriched in differentially expressed gene sets. Although these analysis capabilities are useful and can generate novel insights, one can readily envision the need for more sophisticated analytical approaches that may include other data types or modeling formalisms. These could include integration of interaction and pathway information with transcription factor binding site prediction algorithms (Lähdesmäki et al., 2008) or transcription factor localization measurements using high-throughput technologies such as ChIPseq, proteomic measurements, microRNA regulatory networks and their expression measurements, as well as statistical and mathematical approaches for reconstructing and simulating biomolecular networks (Bonneau et al., 2007; Ramsey et al., 2008).

These approaches undoubtedly require flexible and adaptable software architectures that can support the rapid development of integrated tools for analyzing heterogeneous data typical in systems biology (Boyle et al., 2008). Such software architecture is being used and further developed as part of a large NIAID-funded international consortium focusing on systems approaches to immunology ( The data generated by the project, which include genomic (SNPs and forward genetic methods using ENU mutagenesis), tran-scriptomic (genome-wide exon-level expression), proteomic (membrane-bound or secreted proteins using mass spectrometry or phospho-signaling using multiparameter flow cytometry), and interaction data (protein-DNA interactions measured using ChIP-chip or ChIPseq), are made available through a web-based data portal that provides access to the raw and processed data as well as to higher-level analysis results (Ramsey et al., 2008; Korb et al., 2008). Ultimately, seamless integration of the data in this portal with InnateDB’s interaction and pathway information would enable more powerful analyses to be carried out, thus yielding new insights into the nature of the molecular mechanisms by which the immune system responds to infectious disease by inciting innate inflammatory reactions and instructing adaptive immune responses. A number of large-scale programs directed at a systems-wide understanding of immunity are currently being funded by the NIH. These include programs on innate immunity (, signal transduction (, lipidomics (, glycomics (, inflammation (, and the Immunology Database and Analysis Portal (ImmPort) ( In addition, the Bill and Melinda Gates Foundation have funded a number of large-scale vaccine initiatives ( All of these programs are generating a mountain of data that need to be stored and analyzed. InnateDB promises to be an important brick in the wall.


  • Barsky A, Gardy JL, Hancock RE, Munzner T. Bioinformatics. 2007;23:1040–1042. [PubMed]
  • Bonneau R, Facciotti MT, Reiss DJ, Schmid AK, Pan M, Kaur A, Thorsson V, Shannon P, Johnson MH, Bare JC, et al. Cell. 2007;131:1354–1365. [PubMed]
  • Boyle J, Cavnor C, Killcoyne S, Shmulevich I. BMC Bioinformatics. 2008;9:295. [PMC free article] [PubMed]
  • Gilchrist M, Thorsson V, Li B, Rust AG, Korb M, Roach JC, Kennedy K, Hai T, Bolouri H, Aderem A. Nature. 2006;441:173–178. [PubMed]
  • Korb M, Rust AG, Thorsson V, Battail C, Li B, Hwang D, Kennedy KA, Roach JC, Rosenberger CM, Gilchrist M, et al. BMC Immunol. 2008;9:7. [PMC free article] [PubMed]
  • Lähdesmäki H, Rust AG, Shmulevich I. PLoS ONE. 2008;3:e1820. doi: 10.1371/journal.pone.0001820. [PMC free article] [PubMed] [Cross Ref]
  • Lynn DJ, Winsor GL, Chan C, Richard N, Laird MR, Barsky A, Gardy JL, Roche FM, Chan TH, Shah N, et al. Mol Syst Biol. 2008;4:218. [PMC free article] [PubMed]
  • Oda K, Kitano H. Mol Syst Biol. 2006;2:2006.0015.
  • Ramsey SA, Klemm SL, Zak DE, Kennedy KA, Thorsson V, Li B, Gilchrist M, Gold ES, Johnson CD, Litvak V, et al. PLoSComput Biol. 2008;4:e1000021. doi: 10.1371/journal.pcbi.1000021. [PMC free article] [PubMed] [Cross Ref]
  • Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Genome Res. 2003;13:2498–2504. [PubMed]