Although considerable progress has been made in dissecting the signaling pathways involved in the innate immune response, it is now apparent that this response can no longer be productively thought of in terms of simple linear pathways. InnateDB (www.innatedb.ca) has been developed to facilitate systems-level analyses that will provide better insight into the complex networks of pathways and interactions that govern the innate immune response. InnateDB is a publicly available, manually curated, integrative biology database of the human and mouse molecules, experimentally verified interactions and pathways involved in innate immunity, along with centralized annotation on the broader human and mouse interactomes. To date, more than 3500 innate immunity-relevant interactions have been contextually annotated through the review of 1000 plus publications. Integrated into InnateDB are novel bioinformatics resources, including network visualization software, pathway analysis, orthologous interaction network construction and the ability to overlay user-supplied gene expression data in an intuitively displayed molecular interaction network and pathway context, which will enable biologists without a computational background to explore their data in a more systems-oriented manner.
database; gene expression; innate immunity; interaction network; pathway visualization
InnateDB (http://www.innatedb.com) is an integrated analysis platform that has been specifically designed to facilitate systems-level analyses of mammalian innate immunity networks, pathways and genes. In this article, we provide details of recent updates and improvements to the database. InnateDB now contains >196 000 human, mouse and bovine experimentally validated molecular interactions and 3000 pathway annotations of relevance to all mammalian cellular systems (i.e. not just immune relevant pathways and interactions). In addition, the InnateDB team has, to date, manually curated in excess of 18 000 molecular interactions of relevance to innate immunity, providing unprecedented insight into innate immunity networks, pathways and their component molecules. More recently, InnateDB has also initiated the curation of allergy- and asthma-related interactions. Furthermore, we report a range of improvements to our integrated bioinformatics solutions including web service access to InnateDB interaction data using Proteomics Standards Initiative Common Query Interface, enhanced Gene Ontology analysis for innate immunity, and the availability of new network visualizations tools. Finally, the recent integration of bovine data makes InnateDB the first integrated network analysis platform for this agriculturally important model organism.
The innate immune response is the first line of defence against invading pathogens and is regulated by complex signalling and transcriptional networks. Systems biology approaches promise to shed new light on the regulation of innate immunity through the analysis and modelling of these networks. A key initial step in this process is the contextual cataloguing of the components of this system and the molecular interactions that comprise these networks. InnateDB (http://www.innatedb.com) is a molecular interaction and pathway database developed to facilitate systems-level analyses of innate immunity.
Here, we describe the InnateDB curation project, which is manually annotating the human and mouse innate immunity interactome in rich contextual detail, and present our novel curation software system, which has been developed to ensure interactions are curated in a highly accurate and data-standards compliant manner. To date, over 13,000 interactions (protein, DNA and RNA) have been curated from the biomedical literature. Here, we present data, illustrating how InnateDB curation of the innate immunity interactome has greatly enhanced network and pathway annotation available for systems-level analysis and discuss the challenges that face such curation efforts. Significantly, we provide several lines of evidence that analysis of the innate immunity interactome has the potential to identify novel signalling, transcriptional and post-transcriptional regulators of innate immunity. Additionally, these analyses also provide insight into the cross-talk between innate immunity pathways and other biological processes, such as adaptive immunity, cancer and diabetes, and intriguingly, suggests links to other pathways, which as yet, have not been implicated in the innate immune response.
In summary, curation of the InnateDB interactome provides a wealth of information to enable systems-level analysis of innate immunity.
Production and function of natural antibodies (NAbs) constitutes an important mechanism of the humoral innate immunity in vertebrates. The level of NAbs in chicken is heritable and the genetic background has been partly investigated. However, to date the genetic determination of humoral innate immune response in avian species has not been fully described. The goal of this study was to propose a new set of candidate genes with a potential effect on the NAb phenotype for further SNP association study.
In silico analysis of positional and functional candidate genes covered 14 QTL regions associated with LPS, LTA & KLH NAbs and located on six chromosomes: GGA5, GGA6, GGA9, GGA14, GGA18 and GGAZ. The function of the genes was subsequently determined based on the NCBI, KEGG, Gene Ontology and InnateDB databases.
As a result, the core panel of 38 genes participating in metabolic pathways of innate immune response was proposed. Most of them were assigned to chromosomes: GGA14, GGA5, GGA6 and GGAZ (13, 9, 8 and 5 genes, respectively). These candidate genes encode proteins predicted to play a role in (i) proliferation, differentiation and function of B lymphocytes; (ii) TLR signalling pathway, and (iii) MAP signalling cascade.
Proposed set of candidate genes is recommended to be included in the follow-up studies to model genetic networks of innate humoral immune response in chicken.
The identification of novel candidate markers is a key challenge in the development of cancer therapies. This can be facilitated by putting accessible and automated approaches analysing the current wealth of ‘omic’-scale data in the hands of researchers who are directly addressing biological questions. Data integration techniques and standardized, automated, high-throughput analyses are needed to manage the data available as well as to help narrow down the excessive number of target gene possibilities presented by modern databases and system-level resources. Here we present CancerMA, an online, integrated bioinformatic pipeline for automated identification of novel candidate cancer markers/targets; it operates by means of meta-analysing expression profiles of user-defined sets of biologically significant and related genes across a manually curated database of 80 publicly available cancer microarray datasets covering 13 cancer types. A simple-to-use web interface allows bioinformaticians and non-bioinformaticians alike to initiate new analyses as well as to view and retrieve the meta-analysis results. The functionality of CancerMA is shown by means of two validation datasets.
The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use.
The innate immune system is an ancient component of host defense. Since innate immunity pathways are well conserved throughout many eukaryotes, immune genes in model animals can be used to putatively identify homologous genes in newly sequenced genomes of non-model organisms. With the initiation of the “i5k” project, which aims to sequence 5,000 insect genomes by 2016, many novel insect genomes will soon become publicly available, yet few annotation resources are currently available for insects. Thus, we developed an online tool called the Insect Innate Immunity Database (IIID) to provide an open access resource for insect immunity and comparative biology research (http://www.vanderbilt.edu/IIID). The database provides users with simple exploratory tools to search the immune repertoires of five insect models (including Nasonia), spanning three orders, for specific immunity genes or genes within a particular immunity pathway. As a proof of principle, we used an initial database with only four insect models to annotate potential immune genes in the parasitoid wasp genus Nasonia. Results specify 306 putative immune genes in the genomes of N. vitripennis and its two sister species N. giraulti and N. longicornis. Of these genes, 146 were not found in previous annotations of Nasonia immunity genes. Combining these newly identified immune genes with those in previous annotations, Nasonia possess 489 putative immunity genes, the largest immune repertoire found in insects to date. While these computational predictions need to be complemented with functional studies, the IIID database can help initiate and augment annotations of the immune system in the plethora of insect genomes that will soon become available.
The Mouse Genome Database (MGD) is the community model organism database for the laboratory mouse and the authoritative source for phenotype and functional annotations of mouse genes. MGD includes a complete catalog of mouse genes and genome features with integrated access to genetic, genomic and phenotypic information, all serving to further the use of the mouse as a model system for studying human biology and disease. MGD is a major component of the Mouse Genome Informatics (MGI, http://www.informatics.jax.org/) resource. MGD contains standardized descriptions of mouse phenotypes, associations between mouse models and human genetic diseases, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information. Data are obtained and integrated via manual curation of the biomedical literature, direct contributions from individual investigators and downloads from major informatics resource centers. MGD collaborates with the bioinformatics community on the development and use of biomedical ontologies such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology. Major improvements to the Mouse Genome Database include comprehensive update of genetic maps, implementation of new classification terms for genome features, development of a recombinase (cre) portal and inclusion of all alleles generated by the International Knockout Mouse Consortium (IKMC).
Streptococcus salivarius is an early colonizer of human oral and nasopharyngeal epithelia, and strain K12 has reported probiotic effects. An emerging paradigm indicates that commensal bacteria downregulate immune responses through the action on NF-κB signaling pathways, but additional mechanisms underlying probiotic actions are not well understood. Our objective here was to identify host genes specifically targeted by K12 by comparing their responses with responses elicited by pathogens and to determine if S. salivarius modulates epithelial cell immune responses. RNA was extracted from human bronchial epithelial cells (16HBE14O- cells) cocultured with K12 or bacterial pathogens. cDNA was hybridized to a human 21K oligonucleotide-based array. Data were analyzed using ArrayPipe, InnateDB, PANTHER, and oPOSSUM. Interleukin 8 (IL-8) and growth-regulated oncogene alpha (Groα) secretion were determined by enzyme-linked immunosorbent assay. It was demonstrated that S. salivarius K12 specifically altered the expression of 565 host genes, particularly those involved in multiple innate defense pathways, general epithelial cell function and homeostasis, cytoskeletal remodeling, cell development and migration, and signaling pathways. It inhibited baseline IL-8 secretion and IL-8 responses to LL-37, Pseudomonas aeruginosa, and flagellin in epithelial cells and attenuated Groα secretion in response to flagellin. Immunosuppression was coincident with the inhibition of activation of the NF-κB pathway. Thus, the commensal and probiotic behaviors of S. salivarius K12 are proposed to be due to the organism (i) eliciting no proinflammatory response, (ii) stimulating an anti-inflammatory response, and (iii) modulating genes associated with adhesion to the epithelial layer and homeostasis. S. salivarius K12 might thereby ensure that it is tolerated by the host and maintained on the epithelial surface while actively protecting the host from inflammation and apoptosis induced by pathogens.
In the last decade, significant progress has been made in expanding the scope and depth of publicly available immunological databases and online analysis resources, which have become an integral part of the repertoire of tools available to the scientific community for basic and applied research. Herein, we present a general overview of different resources and databases currently available. Because of our association with the Immune Epitope Database and Analysis Resource, this resource is reviewed in more detail. Our review includes aspects such as the development of formal ontologies and the type and breadth of analytical tools available to predict epitopes and analyze immune epitope data. A common feature of immunological databases is the requirement to host large amounts of data extracted from disparate sources. Accordingly, we discuss and review processes to curate the immunological literature, as well as examples of how the curated data can be used to generate a meta-analysis of the epitope knowledge currently available for diseases of worldwide concern, such as influenza and malaria. Finally, we review the impact of immunological databases, by analyzing their usage and citations, and by categorizing the type of citations. Taken together, the results highlight the growing impact and utility of immunological databases for the scientific community.
Database; Epitope; Epitope prediction tools; T cell; Antibody
Biology is generating more data than ever. As a result, there is an ever increasing number of publicly available databases that analyse, integrate and summarize the available data, providing an invaluable resource for the biological community. As this trend continues, there is a pressing need to organize, catalogue and rate these resources, so that the information they contain can be most effectively exploited. MetaBase (MB) (http://MetaDatabase.Org) is a community-curated database containing more than 2000 commonly used biological databases. Each entry is structured using templates and can carry various user comments and annotations. Entries can be searched, listed, browsed or queried. The database was created using the same MediaWiki technology that powers Wikipedia, allowing users to contribute on many different levels. The initial release of MB was derived from the content of the 2007 Nucleic Acids Research (NAR) Database Issue. Since then, approximately 100 databases have been manually collected from the literature, and users have added information for over 240 databases. MB is synchronized annually with the static Molecular Biology Database Collection provided by NAR. To date, there have been 19 significant contributors to the project; each one is listed as an author here to highlight the community aspect of the project.
The pathobiology of common diseases is influenced by heterogeneous factors interacting in complex networks. CIDeR http://mips.helmholtz-muenchen.de/cider/ is a publicly available, manually curated, integrative database of metabolic and neurological disorders. The resource provides structured information on 18,813 experimentally validated interactions between molecules, bioprocesses and environmental factors extracted from the scientific literature. Systematic annotation and interactive graphical representation of disease networks make CIDeR a versatile knowledge base for biologists, analysis of large-scale data and systems biology approaches.
The Mouse Genome Database (MGD) (http://www.informatics.jax.org) one component of a community database resource for the laboratory mouse, a key model organism for interpreting the human genome and for understanding human biology. MGD strives to provide an extensively integrated information resource with experimental details annotated from both literature and on-line genomic data sources. MGD curates and presents the consensus representation of genotype (sequence) to phenotype information including highly detailed information about genes and gene products. Primary foci of integration are through representations of relationships between genes, sequences and phenotypes. MGD collaborates with other bioinformatics groups to curate a definitive set of information about the laboratory mouse. Recent developments include a general implementation of database structures for controlled vocabularies and the integration of a phenotype classification system.
The Candida Genome Database (CGD, http://www.candidagenome.org/) is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at http://www.candidagenome.org/. We welcome feedback from the research community at firstname.lastname@example.org.
Reactome, located at http://www.reactome.org is a curated, peer-reviewed resource of human biological processes. Given the genetic makeup of an organism, the complete set of possible reactions constitutes its reactome. The basic unit of the Reactome database is a reaction; reactions are then grouped into causal chains to form pathways. The Reactome data model allows us to represent many diverse processes in the human system, including the pathways of intermediary metabolism, regulatory pathways, and signal transduction, and high-level processes, such as the cell cycle. Reactome provides a qualitative framework, on which quantitative data can be superimposed. Tools have been developed to facilitate custom data entry and annotation by expert biologists, and to allow visualization and exploration of the finished dataset as an interactive process map. Although our primary curational domain is pathways from Homo sapiens, we regularly create electronic projections of human pathways onto other organisms via putative orthologs, thus making Reactome relevant to model organism research communities. The database is publicly available under open source terms, which allows both its content and its software infrastructure to be freely used and redistributed.
ZFIN, the Zebrafish Model Organism Database, http://zfin.org, serves as the central repository and web-based resource for zebrafish genetic, genomic, phenotypic and developmental data. ZFIN manually curates comprehensive data for zebrafish genes, phenotypes, genotypes, gene expression, antibodies, anatomical structures and publications. A wide-ranging collection of web-based search forms and tools facilitates access to integrated views of these data promoting analysis and scientific discovery. Data represented in ZFIN are derived from three primary sources: curation of zebrafish publications, individual research laboratories and collaborations with bioinformatics organizations. Data formats include text, images and graphical representations. ZFIN is a dynamic resource with data added daily as part of our ongoing curation process. Software updates are frequent. Here, we describe recent additions to ZFIN including (i) enhanced access to images, (ii) genomic features, (iii) genome browser, (iv) transcripts, (v) antibodies and (vi) a community wiki for protocols and antibodies.
The Mouse Genome Database (MGD) forms the core of the Mouse Genome Informatics (MGI) system (http://www.informatics.jax.org), a model organism database resource for the laboratory mouse. MGD provides essential integration of experimental knowledge for the mouse system with information annotated from both literature and online sources. MGD curates and presents consensus and experimental data representations of genotype (sequence) through phenotype information, including highly detailed reports about genes and gene products. Primary foci of integration are through representations of relationships among genes, sequences and phenotypes. MGD collaborates with other bioinformatics groups to curate a definitive set of information about the laboratory mouse and to build and implement the data and semantic standards that are essential for comparative genome analysis. Recent improvements in MGD discussed here include the enhancement of phenotype resources, the re-development of the International Mouse Strain Resource, IMSR, the update of mammalian orthology datasets and the electronic publication of classic books in mouse genetics.
The Mouse Genome Database (MGD) is a major component of the Mouse Genome Informatics (MGI, http://www.informatics.jax.org/) database resource and serves as the primary community model organism database for the laboratory mouse. MGD is the authoritative source for mouse gene, allele and strain nomenclature and for phenotype and functional annotations of mouse genes. MGD contains comprehensive data and information related to mouse genes and their functions, standardized descriptions of mouse phenotypes, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information including comparative data on mammalian genes. Data for MGD are obtained from diverse sources including manual curation of the biomedical literature and direct contributions from individual investigator’s laboratories and major informatics resource centers, such as Ensembl, UniProt and NCBI. MGD collaborates with the bioinformatics community on the development and use of biomedical ontologies such as the Gene Ontology and the Mammalian Phenotype Ontology. Recent improvements in MGD described here includes integration of mouse gene trap allele and sequence data, integration of gene targeting information from the International Knockout Mouse Consortium, deployment of an MGI Biomart, and enhancements to our batch query capability for customized data access and retrieval.
The Mouse Genome Database (MGD) is one component of the Mouse Genome Informatics (MGI) system (http://www.informatics.jax.org), a community database resource for the laboratory mouse. MGD strives to provide a comprehensive knowledgebase about the mouse with experiments and data annotated from both literature and online sources. MGD curates and presents consensus and experimental data representations of genetic, genotype (sequence) and phenotype information including highly detailed reports about genes and gene products. Primary foci of integration are through representations of relationships between genes, sequences and phenotypes. MGD collaborates with other bioinformatics groups to curate a definitive set of information about the laboratory mouse and to build and implement the data and semantic standards that are essential for comparative genome analysis. Recent developments in MGD discussed here include an extensive integration of the mouse sequence data and substantial revisions in the presentation, query and visualization of sequence data.
GeneSigDB (http://www.genesigdb.org or http://compbio.dfci.harvard.edu/genesigdb/) is a database of gene signatures that have been extracted and manually curated from the published literature. It provides a standardized resource of published prognostic, diagnostic and other gene signatures of cancer and related disease to the community so they can compare the predictive power of gene signatures or use these in gene set enrichment analysis. Since GeneSigDB release 1.0, we have expanded from 575 to 3515 gene signatures, which were collected and transcribed from 1604 published articles largely focused on gene expression in cancer, stem cells, immune cells, development and lung disease. We have made substantial upgrades to the GeneSigDB website to improve accessibility and usability, including adding a tag cloud browse function, facetted navigation and a ‘basket’ feature to store genes or gene signatures of interest. Users can analyze GeneSigDB gene signatures, or upload their own gene list, to identify gene signatures with significant gene overlap and results can be viewed on a dynamic editable heatmap that can be downloaded as a publication quality image. All data in GeneSigDB can be downloaded in numerous formats including .gmt file format for gene set enrichment analysis or as a R/Bioconductor data file. GeneSigDB is available from http://www.genesigdb.org.
The Mouse Genome Database, (MGD, http://www.informatics.jax.org/), integrates genetic, genomic and phenotypic information about the laboratory mouse, a primary animal model for studying human biology and disease. MGD data content includes comprehensive characterization of genes and their functions, standardized descriptions of mouse phenotypes, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information including comparative data on mammalian genes. Data within MGD are obtained from diverse sources including manual curation of the biomedical literature, direct contributions from individual investigator's laboratories and major informatics resource centers such as Ensembl, UniProt and NCBI. MGD collaborates with the bioinformatics community on the development of data and semantic standards such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology. MGD provides a data-mining platform that enables the development of translational research hypotheses based on comparative genotype, phenotype and functional analyses. Both web-based querying and computational access to data are provided. Recent improvements in MGD described here include the association of gene trap data with mouse genes and a new batch query capability for customized data access and retrieval.
The BioHealthBase Bioinformatics Resource Center (BRC) (http://www.biohealthbase.org) is a public bioinformatics database and analysis resource for the study of specific biodefense and public health pathogens—Influenza virus, Francisella tularensis, Mycobacterium tuberculosis, Microsporidia species and ricin toxin. The BioHealthBase serves as an extensive integrated repository of data imported from public databases, data derived from various computational algorithms and information curated from the scientific literature. The goal of the BioHealthBase is to facilitate the development of therapeutics, diagnostics and vaccines by integrating all available data in the context of host–pathogen interactions, thus allowing researchers to understand the root causes of virulence and pathogenicity. Genome and protein annotations can be viewed either as formatted text or graphically through a genome browser. 3D visualization capabilities allow researchers to view proteins with key structural and functional features highlighted. Influenza virus host–pathogen interactions at the molecular/cellular and systemic levels are represented. Host immune response to influenza infection is conveyed through the display of experimentally determined antibody and T-cell epitopes curated from the scientific literature or as derived from computational predictions. At the molecular/cellular level, the BioHealthBase BRC has developed biological pathway representations relevant to influenza virus host–pathogen interaction in collaboration with the Reactome database (http://www.reactome.org).
The Gene3D structural domain database provides domain annotations for 7 million proteins, based on the manually curated structural domain superfamilies in CATH. These annotations are integrated with functional, genomic and molecular information from external resources, such as GO, EC, UniProt and the NCBI Taxonomy database. We have constructed a set of web services that provide programmatic access to this integrated database, as well as the Gene3D domain recognition tool (Gene3DScan) and protein sequence annotation pipeline for analysing novel protein sequences. Example queries include retrieving all curated GO terms for a domain superfamily or all the multi-domain architectures for the human genome. The services can be accessed using simple HTTP calls and are able to return results in a range of formats for quick downloading and easy parsing, graphical rendering and data storage. Hence, they provide a simple, but flexible means of integrating domain annotations and associated data sets into locally run pipelines and analysis software. The services can be found at http://gene3d.biochem.ucl.ac.uk/WebServices/.
The rapid pace at which genomic and proteomic data is being generated necessitates the development of tools and resources for managing data that allow integration of information from disparate sources. The Human Protein Reference Database (http://www.hprd.org) is a web-based resource based on open source technologies for protein information about several aspects of human proteins including protein–protein interactions, post-translational modifications, enzyme–substrate relationships and disease associations. This information was derived manually by a critical reading of the published literature by expert biologists and through bioinformatics analyses of the protein sequence. This database will assist in biomedical discoveries by serving as a resource of genomic and proteomic information and providing an integrated view of sequence, structure, function and protein networks in health and disease.
Chemicals in the environment play a critical role in the etiology of many human diseases. Despite their prevalence, the molecular mechanisms of action and the effects of chemicals on susceptibility to disease are not well understood. To promote understanding of these mechanisms, the Comparative Toxicogenomics Database (CTD; http://ctd.mdibl.org/) presents scientifically reviewed and curated information on chemicals, relevant genes and proteins, and their interactions in vertebrates and invertebrates. CTD integrates sequence, reference, species, microarray, and general toxicology information to provide a unique centralized resource for toxicogenomic research. The database also provides visualization capabilities that enable cross-species comparisons of gene and protein sequences. These comparisons will facilitate understanding of structure-function correlations and the genetic basis of susceptibility. Manual curation and integration of cross-species chemical-gene and chemical-protein interactions from the literature are now underway. These data will provide information for building complex interaction networks. New CTD features include: 1) cross-species gene, rather than sequence, query and visualization capabilities; 2) integrated cross-links to microarray data from chemicals, genes, and sequences in CTD; 3) a reference set related to chemical-gene and protein interactions identified by an information retrieval system; and 4) a “Chemicals in the News” initiative that provides links from CTD chemicals to environmental health articles from the popular press. Here we describe these new features and our novel cross-species curation of chemical-gene and chemical protein interactions.