For each host/pathogen interaction condition, BioSignatureDS™ modeled and scored 219 known metabolic and signaling pathways and 1620 biological processes (gene groups associated with Gene Ontology (GO) terms) at four time points. DBGGA was employed to identify the perturbations between pathogen conditions for pathways, GO categories, and genes. The DBGGA method generates Bayesian log likelihood scores that are normalized and transformed to a z-score equivalent (hereafter Bayesian z-score) so that all pathways and GO groups across all host/pathogen conditions can be equivalently compared and assessed for significance.
DBGGA pathway analysis
Of the 219 signaling/metabolic pathways scored, we focused on a subset of immune response related pathways as listed in Figure . This figure shows a heat map comparison of pathway Bayesian z-scores between pathogen conditions over time post infection. There were considerable differences between the host response profiles. MAP had strong early (30 minute) induction of the majority of its pathways and appeared to reverse to a more suppressive state by 240 minutes. STM’s early response indicated mild perturbations at 30 minutes that increased over time until several pathways were strongly induced by 240 minutes. BMEL was more strongly suppressive for the majority of pathways over time. At early times (30, 60 minutes) there were a few commonly induced pathways: Antigen Processing and Presentation, B Cell Receptor Signaling, Fc epsilon RI Signaling, Hedgehog Signaling, and Natural Killer Cell Mediated Cytotoxicity. In contrast, only ECM-receptor Interaction, Apoptotic Signaling and Apoptotic DNA Fragmentation had similar suppressions at 30 and 60 minutes. Interestingly, there was no single pathway at later times (120, 240 minutes) with similar perturbed states, implying that the host defenses have divergent biosignatures against the various virulent mechanisms presented by the pathogens.
Figure 1 Heat map comparison of pathway scores for each host condition by sampling time point post infection. The score magnitudes are shown as a gradient color from light to bright red for induced and from light to bright green for suppressed pathway activity. (more ...)
Significant divergent responses between conditions were observed for MAPK Signaling. For MAP, the MAPK pathway reversed from induced to suppressed, while STM increased induction and BMEL maintained a suppressed state. The MAPK Signaling Pathway was implicated in bacterial pathogenesis for a number of pathogens such as Salmonella enterica
serovar Typhimurium [10
], and Mycobacterium
]. This pathway was selected for more detailed discussion with regard to gene perturbations and mechanistic interpretations. Figure is a heat map of significantly perturbed genes for the MAPK pathway by pathogen condition. In this figure, the genes are sorted in order of highest up modulation to lowest down modulation, and for a gene to be included in this figure, a Bayesian z-score>|2.24| at any one time point was required. The Bayesian z-score > |2.24| reflects 99% confidence in the data. It is easy to observe that the perturbed genes and their expression patterns are quite different between conditions. Surprisingly, of the 171 measured genes on this pathway, only two genes in Figure were found to be commonly perturbed across all three pathogen conditions: 1) IL1A
, which encodes interleukin 1 protein involved in various immune responses, inflammatory processes, and hematopoiesis; and 2) RASGRP1
, which encodes a protein characterized by the presence of a Ras superfamily guanine nucleotide exchange factor (GEF) domain that activates the Erk/MAP kinase cascade and regulates T-cell and B-cell development, homeostasis and differentiation. The perturbation of IL1A
is consistent with genes involved in immune response, but the expression patterns for these two genes vary significantly between pathogens.
Biosignature heat maps of gene scores for the MAPK signaling pathway for each host-pathogen condition. Brucella melitensis (BMEL), Salmonella enterica Typhimurium (STM) and Mycobacterium avium paratuberculosis (MAP).
Simply comparing and contrasting the expression patterns of perturbed genes was inadequate for deciphering the MAPK pathway response dynamics. Clearly, the uniqueness of the MAPK pathway responses suggested that very different invasion/evasion mechanisms have evolved for each pathogen. More sophisticated methods are needed to identify potential points of host response disruptions. This is done by interrogating the trained DBN model for the MAPK Pathway for genes that exceed threshold Bayesian z-scores>|2.24| (“mechanistic genes”) and gene-gene network relationships (arcs). For example, Figure shows the visualization of the MAPK pathway network. The network can be employed to visualize several key features that would otherwise be difficult to discern by looking at spread sheet lists of genes. For example, the state of gene modulation is distinguished by color coded nodes. The state of upstream and downstream genes can be easily identified. Various threshold levels can be entered to identify significantly perturbed genes (annotated with orange circles, Fig ). The strength of correlation between gene pairs is indicated by the color and thickness of the arcs connecting the genes.
Figure 3 MAPK pathway network model for bovine host infected with Salmonella enterica Typhimurium (STM). The model is trained and then used to score the pathway and individual genes. This figure is the MAPK pathway as a screen capture taken from BioSignatureDS™ (more ...)
In Table , we show a list of 20 specific gene-to-gene relations associated with the MAPK pathway (Figure ) having strong positive and/or negative arc weight correlations. We normalized the DBN arc weights to allow equivalent comparison to other pathway gene-gene relations. In this arc weight table, a few significant relationships are labeled (1 thru ) in the table and on the network (Figure ) with an encircled number and arrow pointing to the corresponding arc. It is hypothesized that virulence factors from each pathogen can have different disruptive influences on the host’s MAPK Signaling Pathway and that such disruption can be used to identify pathogenic mechanisms unique to each pathogen. For example, the relation arc TRAF2->FLNA had a strong positive weight (correlated) for STM-Host (0.241) and for the BMEL-Host (0.204) while MAP-Host had a large negative weight (anti-correlated) of -0.17. The reversal of gene-to-gene arc weight of MAP-Host may indicate a disruption of either TRAF or FLNA gene by virulent factors of the pathogens, identifying potential novel points of interaction. TRAF2 encodes a protein that is a member of the TNF receptor associated factor (TRAF) protein family. This protein is required for TNF-alpha-mediated activation of MAPK8/JNK and NF-κB. It has a binding relationship with the filamin-A protein encoded by the FLNA gene. FLNA participates in the anchoring of membrane proteins to the actin cytoskeleton. This type of interaction analysis can be done for every pathway and used to identify novel differences between pathogen conditions. The visualization of mechanistic genes and arc weight enables an efficient identification of the differences between pathogenic influences.
DBGGA gene ontology analysis
A DBGGA analysis was also conducted for gene GO categories. For each pathogen condition, 1620 biological process GO categories were scored. Each condition produced its own unique set of highly scored GO functions, but for comparison purposes, we chose a small subset of highly perturbed categories to illustrate the different temporal responses as shown in Figure . Figure illustrates the comparative analysis of gene scores for the phagocytosis GO Group.
Figure 4 (a) Gene ontology group scores using DBGGA A few groups are displayed here to illustrate the GO group scoring difference between pathogen conditions. (b) Gene scores for phagocytosis GO group Illustrating the temporal difference of gene expression Bayesian (more ...)
Biological system model generation for comparative host response analysis
Utilizing BioSignatureDS™, the significantly perturbed pathways and gene groups from DBGGA were integrated to construct a plausible system level model of the disease/condition. The system model encompasses whole time-course patterns and multi-conditional behaviors of a large group of genes/proteins. The system model can be used for more efficient comparative modeling, pattern recognition and simulation supporting “what-if” type of analyses. The system model is constructed from a method based on merging of pathways with known gene/protein relationships and produces a trained and optimized network model similar to the pathway network shown in Figure . Following this procedure, a system model was constructed from 13 selected pathways (listed in Table ) showing significant perturbation across the three pathogen conditions. As can be seen in this table, pathways have a remarkable difference in perturbation that is exploited for identifying unique and common gene patterns. The resulting model has a common network structure that is trained using the host response data. This model is employed for comparing biosignatures between the different pathogen host responses and was comprised of 622 genes and over 750 gene-to-gene relations. We trained three models, each with data from a different host-pathogen condition. For each model and each host-pathogen time point data sample, we tested the hypothesis that the sample did not fit the model at a confidence level of 95%. Table provides the results of this testing in terms of sensitivity and specificity for each of the models listed.
Pathways Utilized for System Model Generation
Sensitivity and Specificity Model Assessment
Cross-species protein-protein interactome model
Ultimately, the goal is to develop interactome models between the pathogen and the host to make predictions of protein-protein interactions (PPIs). A preliminary STM-Host model (model not shown herein) was created using BioSignatureDS™ augmented with new computational methods to learn PPIs between the pathogen and the host. Model PPI structure learning utilized (host and STM) microarray gene expressions and mass spec protein levels (extracted and measured simultaneously from the bovine host [14
]) guided by a prior
biological interactions and cross-species PPI predictions. The model identified a set of 34 known and novel STM-Host protein interactions. These predicted interactions will be employed in guiding future biological experiments for their confirmation as host-pathogen virulence factors and are candidates for points of intervention. PPI interactome models will be developed for BMEL-Host and MAP-Host and full results reported in future publications.