PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bbiLibertas AcademicaJournal Home
 
Bioinform Biol Insights. 2012; 6: 187–202.
Published online 2012 September 3. doi:  10.4137/BBI.S9954
PMCID: PMC3448498

The Core Mouse Response to Infection by Neospora Caninum Defined by Gene Set Enrichment Analyses

Abstract

In this study, the BALB/c and Qs mouse responses to infection by the parasite Neospora caninum were investigated in order to identify host response mechanisms. Investigation was done using gene set (enrichment) analyses of microarray data. GSEA, MANOVA, Romer, subGSE and SAM-GS were used to study the contrasts Neospora strain type, Mouse type (BALB/c and Qs) and time post infection (6 hours post infection and 10 days post infection). The analyses show that the major signal in the core mouse response to infection is from time post infection and can be defined by gene ontology terms Protein Kinase Activity, Cell Proliferation and Transcription Initiation. Several terms linked to signaling, morphogenesis, response and fat metabolism were also identified. At 10 days post infection, genes associated with fatty acid metabolism were identified as up regulated in expression. The value of gene set (enrichment) analyses in the analysis of microarray data is discussed.

Keywords: mouse model, microarray, gene set, host response, immunity, neospora

Introduction

Microarray technology has been used extensively for the study of host responses to infection by a variety of Apicomplexa. Such studies have used well established methods for mining of data to extract the identity of genes, pathways and other host mechanisms differentially expressed within the data sets under study.14 The involvement of changes in expression of genes associated with specific pathways is typically investigated by the application of enrichment tests applied to gene ontology terms, thereby providing evidence for enrichment of pathways by association with these terms.5 Pathway and network analyses potentially offer a number of advantages for analyzing microarray datasets, notwithstanding the identification of specific pathways, and hence networks of genes that are affected by the experimental treatment under study.6,7

More recently, a variety of methods for gene set (enrichment) analysis have evolved, providing an alternative way of mining the transcriptional changes present in microarray data.812 Gene set analysis essentially asks whether the expression of a list of genes in a microarray data set is positively or negatively associated with one of the two experimental groups (eg, infected and uninfected mice). Gene sets represent lists of genes that are typically associated with a chromosomal location, biochemical pathway, or other computationally derived biological processes or experimentally studied systems such as cancer. Most published studies use gene sets downloadable from the Molecular Signatures (MSigDB) database (http://www.broadinstitute.org/gsea/msigdb/index.jsp), but personalized gene sets can also be created through tools such as WhichGenes.13

Various mice types support N. caninum infection. These include the BALB/c and Qs that have been described previously.14,15 The BALB/c mouse is susceptible to infection by N. caninum, whereas the Qs is considerably more resistant to infection by both highly pathogenic and less pathogenic strains such as NC-Liverpool and NC-Nowra respectively. The outcome of infection of BALB/c and Qs mice with N. caninum therefore differs. BALB/c mice typically become lethargic and their fur becomes ruffled, after which they may lose weight and may die,14 whereas the Qs mouse shows none of these signs of infection.15

In this study, gene set analysis was used to mine microarray data derived from a previous study that investigated changes in transcription in mice in response to N. caninum infection.3 In that study both BALB/c and Qs mice were used and their response to infection by NC-Liverpool and NC-Nowra investigated. This dataset was previously analyzed by significance analysis of microarrays, clustering, and gene ontology enrichment and these analyses have shown, notwithstanding, the complexity of the mouse response to infection. Here it is shown that gene set analyses provides a simple, yet effective, method by which to mine microarray data. For the record, elements of the core mouse response to infection by N. caninum are identified and discussed.

Methods

Data

Microarray data was derived from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) series record GSE23520. The data set is derived from a two color study containing two different mice types (BALB/c and Qs) and two different N. caninum (NCNowra and NC-Liverpool) isolates, studied at 6 hours post infection (HPI) and 10 days post infection (DPI). The experimental design, which included at least three mice per experimental group, is described elsewhere.3 Essentially mice were infected with N. caninum and RNA prepared from spleen removed at 6 HPI or 10 DPI. mRNA was converted into cDNA, which was hybridized to microarrays containing the NIA15K gene set, using RNA from uninfected BALB/c or Qs mice as a control. The Kooperberg model-based background correction was applied to the raw microarray data, after which the expression log ratios were normalized within the slide (using print-tip loess). Array data were then RMA normalized within treatments prior to linear model fitting.

As gene set analysis requires gene annotation information, all data for genes in the array data set that did not contain annotation were removed, as were data for multiple probes representing the same gene. Annotation information was obtained from ID converter.16

Gene sets (GMT format) were obtained from the MSigDB for all categories (human, C1–C5). Gene identifiers used in the array dataset and gene sets were gene symbols. Mouse gene sets were also obtained from WhichGenes in the GMT format and the WEHI website (http://bioinf.wehi.edu.au/software/MSigDB/index.html). The WEHI gene sets are derived from the MSigDB by conversion of human gene symbols to mouse orthologs, with reference to the Jackson Laboratory Human and Mouse Orthology Report. In these analyses, the mouse gene sets analyzed were restricted to those sets containing between 10 and 500 genes as recommended.17 A gene set compiled for tissues from the Stanford Microarray Database was also used (http://www.stat.stanford.edu/~tibs/GSA/).

Gene set analysis

All comparisons made involved investigating those gene sets that are associated with either (1) Mouse: meaning mouse type (BALB/c vs. Qs); (2) Type: meaning N. caninum strain (NC-Nowra vs. NC-Liverpool), or (3) Time: meaning time post infection (6 HPI vs. 10 DPI). Data from the GEO series record GSE23520 was pooled for the different groups (Table 1) so that such comparisons could be made across the species N. caninum. For example, in order to compare the response to Qs and BALB/c mice, data from all Qs groups (infected with either N. caninum type and the two different time points) were considered as “Qs”. Similarly all groups from the BALB/c set of infections (infected with either N. caninum type and the two different time points) were considered as “BALB/c”.

Table 1
Experimental codes explaining how the data was analysed.a

Initially gene set enrichment analysis (GSEA)10,12 was performed using the JAVA desktop application18 with the human C1–C5 gene set collections (from the MSigDB), with 1000 permutations of the gene set.

Parametric (PAGE) analyses were performed using the webserver GAzer.19 1263 gene sets using the gene ontology (GO biological process and molecular function categories) pathway and chromosome categories included on the server were investigated. Analyses were done with the fold change option, specifying a minimum of five genes/significant gene set to be reported. More specific parametric gene set analyses (PGSEA from Bioconductor) of mouse gene sets were performed in R using gene sets obtained from WhichGenes, reporting only gene sets returning a P < 0.05.

Four additional methods of gene set enrichment analysis were used to analyze the microarray data. The first method used to analyze the data was SAM-GS which was introduced as an extension of the significance analysis of microarray approach to accommodate gene sets.20 In SAM-GS the sum of the squares of the test statistics from a t-test (or similar) for individual genes is used to calculate a gene set test statistic. A sample permutation is then used to obtain P values for each gene set. To do this, the R implementation of SAM-GS was used. It is available at http://www.ualberta.ca/~yyasui/homepage.html.

The second method used was multivariate analysis of variance.21 In the case where there are two genotype groups, a Hotelling’s T2 test is used to determine whether there are any differences in gene expression between the two groups. A number of non-equivalent tests are available if more than two genotype groups are to be compared. Shrinkage covariance matrix estimators are used to overcome the problem of having fewer samples of each phenotype than there are genes in a gene set. An R implementation of MANOVA was used that can be found at http://mail.cmu.edu.tw/~catsai/research.htm.

The third method that was used to analyze the microarray data was subGSE.22 In this method the genes within each gene set are ranked based on their association with the phenotype group and “strict subsets” of the top i genes are permuted to obtain raw P values, the smallest of which is used to test gene set significance. The advice of the authors of this paper was adopted in that a minimum strict subset size of C = 5 genes was used to ensure that the tests are not too sensitive to single genes with strong association. The C++ implementation of this algorithm was used, which can be found at http://www.rcf.usc.edu/~fsun/Programs/SubGSEWebPages/SubGSEMain.html.

The final method that used was Romer, the rotation mean rank version of GSEA for linear models. This method uses a rotation based simulation to obtain P values. This simulation method was introduced in23 and is discussed in terms of gene analysis elsewhere.24 Ten thousand rotations of the mouse gene sets (categories C1–C5 limited to 10 to 500 genes/gene set) were used, which appeared to give reasonably stable gene set rankings between simulations. The limma package in R was used to perform this analysis.

Once the P values for each test were obtained, the Q value package in R was used to obtain the estimated false discovery rates for each gene set within each test. The method in the Q value package is based on the theory of Q values.25,26

In all the analyses described here, gene sets returning a Q value of <0.25 were initially recorded and lists of gene sets were sorted according to their Q and P values. In order to reduce the lists down from thousands of gene sets to a more manageable number, gene sets ranked by P values (Manova, SAM-GS, Romer, subGSE) were selected representing the top 10% quantile of the cumulative frequency distribution. Gene lists representing those genes in the gene set were used in the Gene_ Search excel macro to extract expression data from the original microarray expression matrix. Gene_Search is a second generation version of the Find_Gene macro described elsewhere.3 The gene lists were then filtered by value so as to identify those genes that met certain criteria based on M values. For example, genes were extracted where M > 0.1 and M < −0.1 for all four groups in the array data representing 6 HPI or 10 DPI or all eight groups from 6 HPI and 10 DPI together. Venn diagrams were constructed using Venny (http://bioinfogp.cnb.csic.es/tools/venny/index.html).

Network analyses

Results obtained from gene set analyses using subGSE (ranked by P value) were displayed by enrichment mapping in cytoscape.17 Descriptors for node clusters were generated using the WordCloud plugin.

Results

GSEA

The expression dataset (GSE23520) has data for 15151 gene probes that after collapsing into gene symbols gave data for 10384 genes. A summary of the number of gene sets that were significantly associated with the three analyses conducted is shown in Table 2. The contrast Time gave a rich number of gene sets for study (258, 111 with Q < 0.1), followed by Mouse, showing the strongest informative signal occurring in the dataset at 6 HPI. The main pathways identified were Glycolysis and Gluconeogenesis, Oxidative Phosphorylation and Proteasome Pathway (Table 3). The highest ranked (by NES) gene set (ICHIBA_ GVHD; not shown) contained 102 genes including Irf1, H2-D1, pbef, S100A9, Nfkbia, H2-DMA, C1qb, Ly6a, Irf8, Tgtp and Gp49a. C5 gene sets gave results associated with a range of pathways such as Regulation of IκB kinase NF-κB Cascade (Ltbr, Myd88, Litaf, Bcl10, Ikbke), Structural Constituent of Ribosome (39 ribosomal proteins), Immune System Process (Ifitm3, Cd34, Il18, Irf8, Bcl10, Il2rg, Il6st, Cd164, Csf1, Cxcr4, Cd26), Inflammatory Response (S100A9, Nfkb1, Cxcl1, Cxcr4, Tnfaip6), and 21 genes of the Mitochondrial Envelope.

Table 2
Summary of GSEA results for Q < 0.25.
Table 3
Summary of GSEA results.

A comparison of the C1 sets associated with Type (Neospora) identified a single gene set (CHR13Q14, Q = 0.0044) that was negatively associated with NCNowra and positively correlated with NC-Liverpool. The leading edge genes in CHR13Q14 were Lcp1, Tpt1, Wbp4, Kpna3, Gtf2f2, and Itm2b. Il1R pathway gave the highest NES with C2 gene sets and contained genes Map3k7, Ikbkb, Tgfb2, Mapk14, Mapk8, Map2k3, Nfkb1, Chuk and Nkkbia as contributing to the core enrichment. MORF_TPT1 had the highest NES of the C4 sets and contained a large number of genes associated with translation including ribosomal proteins Rps5, Rpl5, Rpl7, Rpl31, Rpl15a, Rpl10, Rpl24, Rpl27, Rpl15, Rps25, Rpl12, Rpl22, Rpl38, Rpl11, Rps3a, Rps7, Rps27, Rps4x, Rpl32, Rps14, Rpl18a, Rpl19, Rps10, Rpl10a, Rpl30, Rpl27a and elongation factors Eif4a2, Eif4 g2, Eef1b2, Eef1 g. Gene sets Structural Constituent of Ribosome and Structural Molecule Activity of C5 also gave Q < 0.1 and also contained ribosomal proteins.

Investigations of Mouse identified gene sets primarily in the C2 and C4 categories. In C4 many of the gene sets contained genes associated with the cell cycle, such as cyclins A2, H, B2 and F plus other associated regulatory molecules such as Cdc6, Cdc28, Cdnk2c, Cks1b, Cks2, and Tgfbr3.

Parametric gene set analysis (PAGE)

From the three analyses conducted using GAzer, 166 gene sets were associated with one or more of the contrasts, 66 were associated with Time and Mouse, and 34 were associated with Type. Those gene sets with a Q < 0.1 are listed in Table 4. Seven of the sets featured in two or more of the PAGE analyses outcomes: Hemoglobin complex, Heme Biosynthesis_ GenMAPP, Glutathione metabolism_KEGG, Proteasome_KEGG, Proteasome Core Complex (sensu Eukaryota), Immune Response and Mitochondrial fatty acid betaoxidation_GenMAPP.

Table 4
GAzer results for Q < 0.1.

Both these initial studies using GSEA and PAGE had indicated that Time was rich in correlation with gene sets, indicating that the expression profiles obtained at 6 HPI and 10 DPI are quite different. Further analyses to confirm this observation was performed using PGSEA (in R) with a cut-off P value of 0.05. Using the Tissues gene set, 34 of them were significantly associated with Time. Gene expression data for genes present in these 34 gene sets were extracted; 20 genes were up-regulated in most (7/8 or 8/8) experimental groups studied. Two of the genes (Bst2, Mcl1) are associated with the GO term Regulation of Apoptosis while three were associated with Cell Cycle (Atm, Lats2, Wee1). Eighty six genes were primarily up-regulated at 6 HPI while 132 showed raised expression at 10 DPI (not shown). These gene lists were further investigated by GO enrichment. At 6 HPI the gene list was linked to terms such as Natural Killer Cell Mediated Cytotoxicity, Glycolysis and Cell Redox Homeostasis amongst others. At 10 DPI the gene list was linked to GO terms associated with a range of Cell Division activities, as well as terms such as Protein Amino Acid Phosphorylation, Cell Respiration and Signaling Pathways.

SAM-GS and MANOVA

No gene sets were identified by SAM-GS that correlated with either of the contrasts Mouse or Type, with all gene sets giving Q > 0.25. In contrast, all gene sets studied (8965) were significantly associated with Time (post infection), with all Q values less than 0.06 with most being < 0.003. Similarly with MANOVA, no gene sets were identified that correlated with either of the contrasts Mouse or Type, with all gene sets giving Q > 0.32. 6378 gene sets were significantly associated with Time, with all Q values < 0.034.

Identification of core host response

Experience with SAM-GS and MANOVA produced very long lists of gene sets that were significantly correlated with Time. In order to identify those gene sets that were highly correlated with Time, gene set analyses was repeated using Romer and subGSE and gene sets in the 10% quantile of each list were extracted and the lists compared. Thirty seven gene sets were common to the lists generated by MANOVA, SAM-GS, Romer and subGSE (Table 5). Visual inspection of the gene lists associated with each gene set showed that some gene sets shared genes for which data was responsible for the enrichment observed. For example, Response to UV and Response to Light Stimulus shared similar lists of genes when the microarray data was considered. Similarly, Cytoplasmic Vesicle Part and Cytoplasmic Vesicle Membrane also shared gene lists. Consequently the presence of unique genes in the 37 gene sets was investigated. Extraction of the gene data was performed using the excel macro Gene_ Search and 1951 unique genes were identified in these 37 gene sets. The gene data was further filtered according to value using criteria specific for either 6 HPI, 10 DPI or both (using M > 0.1 or < −0.1; Table 6). Figure 1 shows a Venn diagram that summarizes the number of genes meeting these criteria. The 10 DPI had higher numbers of genes showing altered expression compared to the 6 HPI time point. The identity of the genes affected was also different.

Figure 1
Venn diagram summarizing the number of genes changing in expression at the two different time points (from Table 6).
Table 5
Thirty seven common gene sets found in the top 10% quantile of lists derived by MANOVA, SAM-GS, Romer and subGSE.
Table 6
Number of unique genes defined in the core mouse response to N. caninum from the gene sets of Table 5.

A small number of genes were identified whose expression involved a greater than two fold change in expression in these 37 gene sets (Table 7). Four genes were up-regulated at 6 HPI and four at 10 DPI. None were identified in any other possible group. Aak1, which functions in clathrin mediated endocytosis, showed the highest fold change (81 fold at 10 DPI) in BALB/c mice infected with NC-Liverpool. Expression of Acadvl, associated with mitochondrial long chain fatty acid beta-oxidation in the mouse, was raised in mice at 10 DPI infected with NC-Liverpool. The significance of fatty acid metabolism at 10 DPI is discussed below.

Table 7
List of genes with a twofold (or greater) change in expression identified from the 37 gene sets of Table 5.

Discussion

Gene set enrichment is a method that investigates whether predefined sets of genes are differentially expressed between two sets of conditions (eg, infected vs. uninfected)8,10,12,20 present in microarray data. The most commonly used method is gene set enrichment analysis (GSEA). However, debate and the recognition of shortcomings in gene set analyses7,20 has led to the emergence and validation of other approaches such as SAM-GS and MANOVA for gene set analysis.2729

The microarray data sets used in this study were derived from a series of experiments that involved infection of Qs and BALB/c mice with N. caninum (NC-Nowra or NC-Liverpool strains). Two different time points (6 HPI and 10 DPI) were studied, providing an insight into the host responses occurring early during infection.3 Qs mice are relatively resistant to infection by N. caninum,15 whereas responses in BALB/c are strain specific. NC-Liverpool, for example, is very pathogenic in the BALB/c mouse leading to weight loss, appearance of clinical signs such as head tilting and limb paralysis, and death.14 The comparison of these groups (Type: Nc-Nowra v NC-Liverpool, Mouse: Qs v BALB/c and Time: time post infection) should provide further understanding of an animal’s responses to infection by N. caninum and the mechanisms associated with disease and resistance.

Using the same data set, it was previously demonstrated that the transcriptional responses occurring in the spleen of mice was dependent on a number of factors including the strain of N. caninum used, as well as the mouse type and time post infection.3 The methods of differential gene analyses used included significance of microarrays, ANOVA and clustering methods.30,31 Alternatively, Bayes statistics using the functions lmFit, eBayes and topTable found in limma32 were used in association with gene enrichment methodologies that measured functional enrichment (of gene ontology terms). These approaches identify lists of genes that are assigned to biological processes and functions via the gene ontology language. In contrast the gene set approaches described here represent an alternative approach for mining microarray data. It is argued that informative signals, derived from multiple genes associated with a pathway for example, may be more easily identified than those associated with single genes alone. Such approaches may be beneficial in analyzing data sets where an association with a treatment or phenotype has yet to be identified. Consequently the mouse response to infection by N. caninum was examined here, in anticipation that gene set analyses would provide further insight into identifying host responses that are associated with neosporosis.

GSEA and PAGE were initially used to mine the expression data. The main reason behind this choice was that easily used web servers are available that can be used to rapidly analyze data by gene set analyses. Despite the limitations of these approaches, including use of human gene symbols in the gene sets, they identified that the largest number of gene sets were correlated with Time (post infection) rather than Mouse or Type. Subsequently, gene set analyses were conducted by SAM-GS, MANOVA, Romer and subGSE. The number of gene sets detected that correlated with the microarray expression data was very much dependent on the method used for gene set (enrichment) analyses, as well as the definition of the minimum number of genes to be included. SAM-GS, for example, found no correlations with Mouse or Type. Using expression data merged from both Qs and BALB/c mice types infected with either NC-Liverpool or NC-Nowra, the analyses showed that the host response is quite different at these two time points. Similar observations were made with all methods of analysis. For example, GSEA identified a range of gene sets with an immunological basis such as inflammatory responses and NF-κB signaling. The two time points chosen (6 HPI and 10 DPI) were based on the previous observations of others concerning the mechanisms of innate and adaptive immunity to N. caninum in the mouse. For example, γ-interferon is known to be one response molecule produced at these time points.33 PAGE identified the Jak-Stat cascade as one of the significant gene sets. Overall, these results indicated the timing of the host response (Time) needed to be further investigated for its importance in determining infection outcomes in terms of disease.

GSEA and PAGE both identified significant differences in the expression data derived from mice infected with NC-Nowra or NC-Liverpool (that is they were correlated with strain of N. caninum) suggesting that the mouse response to infection by these two strains is different. The two methods identified very different gene sets that differed between the groups. GSEA identified differences in molecules affecting translation (eg, ribosomal proteins) along with MAP kinase activity whereas PAGE suggested fatty acid metabolism differs along with proteasomal activity (plus others). SAM-GS and MANOVA found no associations in this category. The idea that fatty acid metabolism is influenced by the Neospora strain represents just one example where gene set analyses has provided new hypotheses to explore. The BALB/c and Qs mice differed, according to GSEA, by the expression of genes associated with the cell cycle. GAzer identified haemoglobin/heme-metabolism/oxygen-related gene sets as being significantly different between these mice types. Peroxidase and glutathione metabolism are also in the list produced by PAGE, identifying that redox metabolism differs between mouse types.

Overall the results obtained by GSEA and PAGE were similar with those obtained previously by analyses of individual gene data by SAM, clustering and ANOVA, followed by enrichment analyses of gene lists based on GO.3 The advantages of gene set analysis are, however, evident—unlike analyses of individual genes, it is advantageous to identify several genes of a pathway (gene set) that is altered by the experimental treatment, thereby flagging those pathways for future study. There are also, however, several drawbacks associated with gene set analysis. In the first instance, the presence of a differentially expressed gene in more than one gene set means that several of the associations found can occur simply because of the impact of gene membership on a gene set. An example can be found here in those gene sets that contain ribosomal proteins, which occurred in more than one gene set. The algorithms themselves have also come up for criticism. GSEA was shown to be subject to false positive and negative findings,20 and PAGE ignores gene-specific variances.8

Methods for gene set (enrichment) analyses are typically grouped as two types, competitive or self-contained, with the later gaining widespread popularity based on logical criteria.7,34 SAM-GS, MANOVA and subGSE are examples of self-contained methods for finding gene sets associated with two groups under study. The approach adopted here for identifying the mouse core responses was to select the top ranking gene sets identified by each of these analyses (including Romer) and to simply determine those present in the top 10% quantile of each. In this manner 37 gene sets containing 1521 unique genes were identified as featuring in the mouse response to N. caninum.

Host responses to N. caninum are known to be of the Th1-type and the present dogma is that resistance to infection is mediated via IFN-γ.35,36 Similar to those anti-parasitic mechanisms observed in T. gondii,37 host responses to N. caninum are shown in this and the accompanying studies3 to be extremely diverse in their nature. Of note is the statistical significance supporting claims for involvement of pathways associated with MyD88 and NF-κB, as well as Jak-Stat signaling in the mouse response, for which experimental evidence is now present.38,39 With T. gondii, mouse responses are also based on toll-receptor MyD88, NF-κB, and MAP kinase signaling, resulting in defined inflammatory responses.40 It is reassuring that gene set analyses has identified similar pathways, thereby providing a high degree of confidence in the results presented here and the claims behind the association of other pathways and mechanisms in the mouse response to N. caninum.

Finally, it is now possible to provide a more detailed, albeit general, summary of the core responses of the mouse in response to infection by N. caninum. The influence of γ-interferon on gene expression is extensively described and linked to a vast number of responses including those of dendritic and other antigen- presenting cells, natural killer cells, macrophages and T helper and Treg cells, to name just a few.41 Systems biology approaches have led to the curation of the widespread influence of γ-interferon on gene expression; 31 of the top 50 network hubs (genes) of the γ-interferon network42 were present in the dataset studied here. The fold changes in expression associated with them were relatively small (generally in the 1.1–3 range, eg, Nfkbia and Irf8 were increased across all the groups studied). Five of the hub genes (Irf1, Irf3, Ctnnb1, Raf1, Map3k7) were reduced in expression at 10 DPI by up to 50% of the level shown by uninfected mice (not shown). Text mining using SciMinder identifies 1562 genes linked to a search through the keyword “gamma interferon”; 355 were present on the arrays used here. Only five (Stat1, Irf1, Ccnd2, Lap3, Nod1) showed a greater than twofold increase in expression at either of the two time points studied and all were at 6 HPI.

Table 5 summarizes the identity of 37 gene sets associated with Time, identified by MANOVA, SAM-GS, Romer and subGSE, which define the core mouse response to infection by N. caninum. From a GO perspective there are a number of significant terms in this list, such as Protein Kinase Activity, Cell Proliferation and Transcription Initiation, which reflect core activities differing between the two time points post infection. The word clouds in Figure 2 attempt to summarize the simple terms associated with the core responses identified by just one of the methods used for gene set analyses (subGSE). Although the different methods of gene set enrichment are likely to generate slightly different word clouds as a result of the different results obtained from the enrichment analyses, subGSE was selected for illustration purposes only. Using KEGG based gene sets, two major nodes are observed in the enrichment map composed of transcription and regulation of metabolic process. Gene ontology gene sets provide word clouds focused on nodes related to regulation and protein. In the latter, regulation is linked to a variety of nodes describing functions such as Apoptosis, Programmed Cell Death, Signaling, Transduction, Kinase, Cascade and ikappaB. The protein node is connected to a wide number of other nodes describing protein functions such as Transport, Localization, Modification and Metabolic.

Figure 2
Wordcloud network derived from the subGSE analysis of time. 651 gene sets (P = 0.029) were selected and an enrichment map derived using (A) gene ontology or (B) KEGG as the source of gene sets.

At 6 HPI four genes were identified that showed at least a twofold change in expression in response to infection by N. caninum. Thioredoxin (Trx1) is a fundamental component of the pathways that maintain redox homoestasis,43 B3gnt5 is crucial for development of B cells in spleen,44 Cnn3 regulates phagocyte motility,45 and S100A9 is secreted by both neutrophils and monocytes early during infection.46

An interesting outcome of these analyses is the observed mouse response associated with fatty acid metabolism at 10 DPI. Acadvl, Adipor2 and Mest were all raised significantly in expression at 10 DPI in comparison to uninfected mice. Acadvl is a mitochondrial, very long-chain specific acyl-CoA dehydrogenase involved with the initial steps of fatty acid β oxidation that generates ATP in mitochondria.47 Adipor2 is a receptor for adiponectin, an anti-inflammatory adipocytokine produced by adipocytes.48 Adipor2 is typically expressed in the liver and disruption of Adipor2 results in decreased PPAR-α signaling and increased inflammation and oxidative stress, ultimately leading to glucose intolerance.49 Mest is induced in response to dietary fat50 and knock down of Mest expression prevents adipogenesis.51 Aak1 also showed raised expression in response to infection by N. caninum at 10 DPI; this Ser/Thr kinase triggers clathrin assembly during clathrin-mediated endocytosis, which is also a feature of adiponectin signaling.48 Such observations suggest a direct effect of infection by N. caninum on fat cells, as well as metabolism of fatty acids. Coincidentally, recent research using a new animal model (the fat-tailed dunnart) has provided direct evidence that the mass of body fat is dramatically reduced during the course of infection of a susceptible animal by N. caninum.52 Obviously there are important leads here to investigate further. For example, BALB/c mice infected with NC-Liverpool also tend to loose body weight rapidly from about day 10 DPI and this may also be associated with loss of body fat.

Another of the novel observations made here is the relatively large number of genes in these 37 gene sets that are associated with mammalian development and embryogenesis. This is demonstrated in the word cloud of Figure 2A as the group of nodes in the bottom left hand corner linked to development. Module.38 and the sets linked to Morphogenesis are examples of gene sets containing such genes. Mest is extensively expressed in fetal tissues,53 while Acadvl is also widely expressed.54 S100A9 is a proinflammatory mediator secreted by leukocytes at sites of infection or injury. Studies have also implicated the molecule in control of intrauterine infections,55 as well as the onset of labor.56

A developing theme is therefore that genes associated with host responses to pathogens are also associated with reproduction and fetal development. The multifunctional role of proteins such as those discussed here shows this to be a reasonable assertion. Another example is that of TGF-β, which is produced in response to infection but is also involved in a wide range of other processes including remodeling of the feto-maternal interface.57 A mechanism of molecules possessing many different functions may well represent a means for preserving the health of the pregnant animal during infection, at the potential expense of the unborn fetus. That pregnant mice often resorb fetuses in response to infection is well known and a simple illustration of this.15

Fetal death and abortion are the main clinical signs observed in cattle following infection by N. caninum and it is believed that the route and timing of infection determines the outcome of the pregnancy.58,59 Recently, progress in this area from studies on cattle indicates placental function may contribute to control of infection by N. caninum rather than simply being deleterious to fetal survival.60 Similarly, studies on fetal immunity suggest the timing of infection in relation to development of fetal immune competence is a key process in determining the outcome of infection on pregnancy.59,60 The studies described here and elsewhere provide additional leads to explore during investigations of cattle responses to N. caninum, especially during pregnancy. In cattle, there is evidence that liver Acadvl expression is correlated with serum nonesterified fatty acid levels.61 A duodenal infusion of alpha-linolenic acid into dairy cattle was also shown to have immunomodulating activity that was associated with changes in γ interferon.62 The link between fatty acid metabolism and inflammation is obviously one of the more important areas to be explored further. As it is recognized that immune responses differ between adult and fetus,63,64 studies on fetal immunity may also provide the clues needed to understand the link between neosporosis and fetal death and abortion.

Acknowledgements

We thank Qi Liu and Yutaka Yasui (University of Alberta) and Xiting Yan (Yale University) for their patience and assistance with installing SAM-GS and subGSE respectively.

Footnotes

Competing Interests

JE’s institution has patented previous work on Neospora caninum. Other authors disclose no competing interests.

Author Contributions

Conceived and designed the experiments: JE, SB. Analyzed the data: JE, SB, SG. Wrote the first draft of the manuscript: JE, SB. Contributed to the writing of the manuscript: JE, SB, SG, PK. Agree with manuscript results and conclusions: JE, SB, SG, PK. Jointly developed the structure and arguments for the paper: JE, SB, SG, PK. Made critical revisions and approved final version: JE, SB, SG, PK. All authors reviewed and approved of the final manuscript.

Disclosures and Ethics

As a requirement of publication author(s) have provided to the publisher signed confirmation of compliance with legal and ethical obligations including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this article is unique and not under consideration or published in any other publication, and that they have permission from rights holders to reproduce any copyrighted material. Any disclosures are made in this section. The external blind peer reviewers report no conflicts of interest.

Funding

This research was funded by the University of Technology, Sydney.

References

1. Blader IJ, Manger ID, Boothroyd JC. Microarray analysis reveals previously unknown changes in Toxoplasma gondii-infected human cells. J Biol Chem. 2001 Jun 29;276(26):24223–31. [PubMed]
2. Morrison DA, Ellis JT. The design and analysis of microarray experiments: applications in parasitology. DNA Cell Biol. 2003 Jun;22(6):357–94. [PubMed]
3. Ellis J, Sinclair D, Morrison D, et al. Microarray analyses of mouse responses to infection by Neospora caninum identifies disease associated cellular pathways in the host response. Mol Biochem Parasitol. 2010 Dec;174(2):117–27. [PubMed]
4. Gobert GN, Moertel LP, McManus DP. Microarrays: new tools to unravel parasite transcriptomes. Parasitology. 2005 Oct;131(Pt 4):439–48. [PubMed]
5. Huang da W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009 Jan;37(1):1–13. [PMC free article] [PubMed]
6. Ulitsky I, Krishnamurthy A, Karp RM, Shamir R. DEGAS: de novo discovery of dysregulated pathways in human diseases. PLoS One. 2010;5:e13367. [PMC free article] [PubMed]
7. Goeman JJ, Buhlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007 Apr 15;23(8):980–7. [PubMed]
8. Irizarry RA, Wang C, Zhou Y, Speed TP. Gene set enrichment analysis made simple. Stat Methods Med Res. 2009 Dec;18(6):565–75. [PMC free article] [PubMed]
9. Abatangelo L, Maglietta R, Distaso A, et al. Comparative study of gene set enrichment methods. BMC Bioinformatics. 2009 Sep 2;10:275. 10:2752009. [PMC free article] [PubMed]
10. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545–50. [PubMed]
11. Liu Q, Dinu I, Adewale AJ, Potter JD, Yasui Y. Comparative evaluation of gene-set analysis methods. BMC Bioinformatics. 2007 Nov 7;8:431. [PMC free article] [PubMed]
12. Mootha VK, Lindgren CM, Eriksson KF, et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003 Jul;34(3):267–73. [PubMed]
13. Glez-Pena D, Gomez-Lopez G, Pisano DG, Fdez-Riverola F. WhichGenes: a web-based tool for gathering, building, storing and exporting gene sets with application in gene set enrichment analysis. Nucleic Acids Res. 2009;37:W329–34. [PMC free article] [PubMed]
14. Atkinson R, Harper PA, Ryce C, Morrison DA, Ellis JT. Comparison of the biological characteristics of two isolates of Neospora caninum. Parasitology. 1999 Apr;118(Pt 4):363–70. [PubMed]
15. Quinn HE, Miller CM, Ryce C, Windsor PA, Ellis JT. Characterization of an outbred pregnant mouse model of Neospora caninum infection. J Parasitol. 2002 Aug;88(4):691–6. [PubMed]
16. Alibes A, Yankilevich P, Canada A, Diaz-Uriarte R. IDconverter and IDClight: conversion and annotation of gene and protein IDs. BMC Bioinformatics. 2007 Jan 10;8:9. [PMC free article] [PubMed]
17. Merico D, Isserlin R, Stueker O, Emili A, Bader GD. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One. 2010 Nov 15;5(11):e13984. [PMC free article] [PubMed]
18. Subramanian A, Kuehn H, Gould J, Tamayo P, Mesirov JP. GSEA-P: a desktop application for Gene Set Enrichment Analysis. Bioinformatics. 2007 Dec 1;23(23):3251–3. Epub Jul 20, 2007. [PubMed]
19. Kim SB, Yang S, Kim SK, et al. GAzer: gene set analyzer. Bioinformatics. 2007 Jul 1;23(13):1697–9. [PubMed]
20. Dinu I, Potter JD, Mueller T, et al. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics. 2007 Jul 5;8:242. [PMC free article] [PubMed]
21. Tsai CA, Chen JJ. Multivariate analysis of variance test for gene set analysis. Bioinformatics. 2009;25(7):897–903. [PubMed]
22. Yan X, Sun F. Testing gene set enrichment for subset of genes: Sub-GSE. BMC Bioinformatics. 2008;9:362. [PMC free article] [PubMed]
23. Langsrud Ø. Rotation tests. Statistics and Computing. 2005;15
24. Wu D, Lim E, Vaillant F, et al. ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics. 2010 Sep 1;26(17):2176–82. [PMC free article] [PubMed]
25. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100:9440–5. [PubMed]
26. Storey JD, Tibshirani R. Statistical methods for identifying differentially expressed genes in DNA microarrays. Methods Mol Biol. 2003;224:149–57. [PubMed]
27. Morine MJ, McMonagle J, Toomey S, et al. Bi-directional gene set enrichment and canonical correlation analysis identify key diet-sensitive pathways and biomarkers of metabolic syndrome. BMC Bioinformatics. 2010 Oct 7;11:499. [PMC free article] [PubMed]
28. Fridley BL, Jenkins GD, Biernacka JM. Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods. PLoS One. 2010 Sep 17;5(9):e12693. pii. [PMC free article] [PubMed]
29. Montaner D, Dopazo J. Multidimensional gene set analysis of genomic data. PLoS One. 2010 Apr 27;5(4):e10348. [PMC free article] [PubMed]
30. Eisen MB, Brown PO. DNA arrays for analysis of gene expression. Methods Enzymol. 1999;303:179–205. [PubMed]
31. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001 Apr 24;98(9):5116–21. [PubMed]
32. Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:Article 3. Epub Feb 12, 2004. [PubMed]
33. Khan IA, Schwartzman JD, Fonseka S, Kasper LH. Neospora caninum: role for immune cytokines in host immunity. Exp Parasitol. 1997 Jan;85(1):24–34. [PubMed]
34. Wang X, Dinu I, Liu W, Yasui Y. Linear combination test for hierarchical gene set analysis. Stat Appl Genet Mol Biol. 2011;10(1):Article 13. [PubMed]
35. Quinn HE, Ellis JT, Smith NC. Neospora caninum: a cause of immune-mediated failure of pregnancy. Trends Parasitol. 2002 Sep;18(9):391–4. [PubMed]
36. Quinn HE, Miller CM, Ellis JT. The cell-mediated immune response to Neospora caninum during pregnancy in the mouse is associated with a bias towards production of interleukin-4. Int J Parasitol. 2004 May;34(6):723–32. [PubMed]
37. Miller CM, Boulter NR, Ikin RJ, Smith NC. The immunobiology of the innate response to Toxoplasma gondii. Int J Parasitol. 2009 Jan;39(1):23–39. [PubMed]
38. Mineo TW, Benevides L, Silva NM, Silva JS. Myeloid differentiation factor 88 is required for resistance to Neospora caninum infection. Vet Res. 2009 Jul-Aug;40(4):32. [PMC free article] [PubMed]
39. Kim SK, Fouts AE, Boothroyd JC. Toxoplasma gondii dysregulates IFN-gamma- inducible gene expression in human fibroblasts: insights from a genome-wide transcriptional profiling. J Immunol. 2007 Apr 15;178(8):5154–65. [PubMed]
40. Denkers EY. Toll-like receptor initiated host defense against Toxoplasma gondii. J Biomed Biotechnol. 2010;2010:737125. [PMC free article] [PubMed]
41. Billiau A, Matthys P. Interferon-gamma: a historical perspective. Cytokine Growth Factor Rev. 2009 Apr;20(2):97–113. [PubMed]
42. Lynn DJ, Chan C, Naseer M, et al. Curating the innate immunity interactome. BMC Syst Biol. 2010 Aug 20;4:117. [PMC free article] [PubMed]
43. Sotirchos IM, Hudson AL, Ellis J, Davey MW. Thioredoxins of a parasitic nematode: comparison of the 16- and 12-kDA thioredoxins from Haemonchus contortus. Free Radic Biol Med. 2008 Jun 15;44(12):2026–33. [PubMed]
44. Kuan CT, Chang J, Mansson JE, et al. Multiple phenotypic changes in mice after knockout of the B3 gnt5 gene, encoding Lc3 synthase—a key enzyme in lacto-neolacto ganglioside synthesis. BMC Dev Biol. 2010 Nov 18;10:114. [PMC free article] [PubMed]
45. Huang QQ, Hossain MM, Wu K, et al. Role of H2-calponin in regulating macrophage motility and phagocytosis. J Biol Chem. 2008 Sep 19;283(38):25887–99. [PubMed]
46. Hobbs JA, May R, Tanousis K, et al. Myeloid cell function in MRP-14 (S100A9) null mice. Mol Cell Biol. 2003 Apr;23(7):2564–76. [PMC free article] [PubMed]
47. Chegary M, Brinke H, Ruiter JP, et al. Mitochondrial long chain fatty acid beta-oxidation in man and mouse. Biochim Biophys Acta. 2009 Aug;1791(8):806–15. [PMC free article] [PubMed]
48. Buechler C, Wanninger J, Neumeier M. Adiponectin receptor binding proteins—recent advances in elucidating adiponectin signalling pathways. FEBS Lett. 2010 Oct 22;584(20):4280–6. [PubMed]
49. Tilg H, Moschen AR. Role of adiponectin and PBEF/visfatin as regulators of inflammation: involvement in obesity-associated diseases. Clin Sci (Lond) 2008 Feb;114(4):275–88. [PubMed]
50. Koza RA, Rogers P, Kozak LP. Inter-individual variation of dietary fat-induced mesoderm specific transcript in adipose tissue within inbred mice is not caused by altered promoter methylation. Epigenetics. 2009 Oct 1;4(7):512–8. [PubMed]
51. Jung H, Lee SK, Jho EH. Mest/Peg1 inhibits Wnt signaling via regulation of LRP6 glycosylation. Biochem J. 2011 Jun 1;436(2):263–9. [PubMed]
52. King JS, McAllan B, Spielman DS, et al. Extensive production of Neospora caninum tissue cysts in a carnivorous marsupial succumbing to experimental neosporosis. Vet Res. 2011 Jun 2;42(1):75. [PMC free article] [PubMed]
53. McMinn J, Wei M, Sadovsky Y, Thaker HM, Tycko B. Imprinting of PEG1/MEST isoform 2 in human placenta. Placenta. 2006 Feb-Mar;27(2–3):119–26. [PubMed]
54. Oey NA, Ruiter JP, Ijlst L, et al. Acyl-CoA dehydrogenase 9 (ACAD 9) is the long-chain acyl-CoA dehydrogenase in human embryonic and fetal brain. Biochem Biophys Res Commun. 2006 Jul 21;346(1):33–7. [PubMed]
55. Park SJ, Yoon WG, Song JS, et al. Proteome analysis of human amnion and amniotic fluid by two-dimensional electrophoresis and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Proteomics. 2006 Jan;6(1):349–63. [PubMed]
56. Havelock JC, Keller P, Muleba N, et al. Human myometrial gene expression before and during parturition. Biol Reprod. 2005 Mar;72(3):707–19. [PubMed]
57. Sugawara K, Kizaki K, Herath CB, Hasegawa Y, Hashizume K. Transforming growth factor beta family expression at the bovine feto-maternal interface. Reprod Biol Endocrinol. 2010 Oct 15;8:120. [PMC free article] [PubMed]
58. Reichel MP, Ellis JT. Neospora caninum—how close are we to development of an efficacious vaccine that prevents abortion in cattle? Int J Parasitol. 2009 Sep;39(11):1173–87. [PubMed]
59. Williams DJ, Hartley CS, Bjorkman C, Trees AJ. Endogenous and exogenous transplacental transmission of Neospora caninum—how the route of transmission impacts on epidemiology and control of disease. Parasitology. 2009 Dec;136(14):1895–900. [PubMed]
60. Rosbottom A, Gibney H, Kaiser P, et al. Up regulation of the maternal immune response in the placenta of cattle naturally infected with Neospora caninum. PLoS One. 2011 Jan 19;6(1):e15799. [PMC free article] [PubMed]
61. Loor JJ, Dann HM, Everts RE, et al. Temporal gene expression profiling of liver from periparturient dairy cows reveals complex adaptive mechanisms in hepatic function. Physiol Genomics. 2005 Oct 17;23(2):217–26. [PubMed]
62. Sun P, Wang J, Yang G, Khas E, Liu Q. Effects of different doses of free alpha-linolenic acid infused to the duodenum on the immune function of lactating dairy cows. Arch Anim Nutr. 2010 Dec;64(6):504–13. [PubMed]
63. Witkin SS, Linhares IM, Bongiovanni AM, Herway C, Skupski D. Unique alterations in infection-induced immune activation during pregnancy. BJOG. 2011 Jan;118(2):145–53. [PubMed]
64. Williams DJ, Trees AJ. Protecting babies: vaccine strategies to prevent foetopathy in Neospora caninum-infected cattle. Parasite Immunol. 2006 Mar;28(3):61–7. [PubMed]

Articles from Bioinformatics and Biology Insights are provided here courtesy of Libertas Academica