In this study, we have undertaken an extensive characterization of the changes in gene transcription in C. elegans after infection with three bacterial and two fungal pathogens. We have confirmed genes and pathways shown to be involved in the immune response of the nematode in previous studies and extended the lists of genes that are up-regulated or down-regulated in response to infection with single or multiple pathogens. Furthermore, we have established lists of genes induced by bacterial versus fungal and intestinal versus epidermal infection. These lists characterize pathogen-class and tissue specificity of the immune response. This comparative transcriptomic study reveals the complex composition of the immune response of the nematode.
Comparison of transcriptional changes induced by different pathogens
Large-scale transcriptome studies suffer from variability linked to the experimental technique chosen, and even to the person carrying them out. Our homogenous transcriptome data sets allowed us to perform meaningful inter-pathogen comparisons and define genes induced as part of a shared response. Importantly, the RNA-seq data also allowed us for the first time to compare the responses to two fungal pathogens. Through a comparative analysis with our previous microarray data, we could also identify with high confidence genes specifically induced by one pathogen. This is important as experience has shown that the transcriptional response to infection is quite variable from one experiment to another. There are several reasons for this. One is that many of the genes regulated upon infection are also influenced by other environmental factors. A clear example is the expression of the gene nlp-29
which is partially dependent upon the osmolarity of the culture medium 
, which therefore needs to be carefully controlled. Secondly, microbial pathogenicity is also highly dependent on the precise culture conditions; there is not always a simple way to control tightly all the relevant parameters, especially since some are not even known. Thirdly, there is a dynamic interaction between host and pathogen, with the host response influencing pathogen virulence and vice versa. Given the intrinsic variability on both sides, subtle differences at the start of an infection can lead to marked dissimilarities in its progression. In the current analysis, the results of our previous study of the response to S. marcescens
using oligo-arrays 
stood out. As this was not the case for the results for the response to 2 other bacterial pathogens, and given the underrepresentation in the S. marcescens
data set of “common response genes” 
, this presumably reflects an experimental difference in the strength of the infection for the samples prepared for analysis using oligo-arrays.
A striking observation from the RNA-seq data was that among the genes commonly regulated upon infection by both D. coniospora and Harposporium there was a subset of genes that was regulated inversely after bacterial infection. Thus, we found around 80 genes up-regulated by both fungi but down-regulated by E. faecalis, P. luminescens or S. marcescens, and conversely there was an enrichment in bacterially-induced genes among the genes down-regulated upon infection with both fungal pathogens. In the former category, there were members of several antimicrobial peptide families, which explains in part the surprising bias towards shorter proteins in the shared fungal response. Another class that contributed to this bias was insulin-like molecules. The potential role of insulin signaling in regulating the response of C. elegans to fungal pathogens merits further study.
The neprilysin class of M13 peptidases is another example of a gene class differentially regulated between fungal and bacterial infection. Neprilysins are zinc metallopeptidases, generally found on the outer surface of animal cells. They cleave small signaling peptides (e.g. enkephalin, tachykinin and insulin) and thereby block their action. Of the 27 members in the C. elegans genome, 13 were down-regulated by D. coniospora or by Harposporium, whereas only 2 were induced. In contrast, one or more of the 3 bacteria led to an up-regulation of 9 of the M13 peptidases and to a down-regulation of 2. Only 2 members of the family in C. elegans have been well-characterized functionally, nep-1 and nep-2. Neither were differentially regulated. On the other hand, 6 of the M13 peptidase genes that exhibited differential expression were part of a cluster of 8 peptidase genes on the left arm of chromosome II. They may form part of a regulatory circuit governing autocrine or paracrine small peptide signaling, an aspect of the immune response that is currently poorly understood.
It is, however, important to bear in mind that if one tissue is the main site of the organism's transcriptional response to infection, measured changes in gene expression in different tissues may reflect the normalization procedure. To put it another way, in the case of RNA-seq, given that the absolute number of reads for a particular transcript is divided by the total number of aligned reads obtained with that sample, if there is intense transcriptional activity in the intestine, then the transcript level for epidermal genes will be artificially depressed. This could contribute to the observed down-regulation of the epidermally-expressed fip
genes after infection with any one of the three bacterial intestinal pathogens. Since some intestinal genes like mtl-1
are also down-regulated upon bacterial infection, this cannot explain all observed transcriptional changes. Indeed, there are now examples of pathogens triggering down-regulation of host genes (e.g. 
); this is clearly an area that merits further investigation.
The comparison of our new data sets to previous microarray studies proved surprisingly complicated, due to the use of different gene identifiers and WormBase versions. While the online tool WormMart 
is useful for changing identifiers within a single WormBase release, it is not designed to convert lists between releases. Furthermore, we occasionally found irrelevant genes in the output from WormMart using WS210, because of intrinsic errors in the way queries were handled by WormMart (I. Engelmann, unpublished observations). In developing the WormBase Converter, we circumvented these problems.
While at first sight this might appear a simple task, the nature of gene annotation changes makes following the evolution of gene structures surprisingly complex. In order to permit accurate cross-release comparisons, WormBase Converter has to take into account all successive changes in gene annotation and modifies the list of gene identifiers accordingly. Using this tool we became aware of a very occasional problem with WormBase data. Although the terminology applied to annotation changes, made manually by WormBase curators, should be invariant, this is not absolutely always the case. Thus, for example, while “split from” should mean that a new daughter gene is formed in part from an original parent gene, with retention of a modified parent gene, there are rare examples where during annotation the modified parent gene is deleted, without formally having been “killed”. Obviously, these odd database inconsistencies cannot be taken into account automatically by a tool like the WormBase Converter, which needs to run with fixed rules, but such rare inconsistencies can be readily identified and we included the option to correct manually the output files from the WormBase Converter.
One other minor limitation to the WormBase Converter is that no conversion of a list of gene names into transcript names is possible. This is a consequence of the intrinsic organization of WormBase data, since changes to gene structure between different releases are recorded using the WB ID identifier, which is a gene level identifier 
. On the positive side, our program has been written to maintain lists continuously up-to-date, now and in the future, which is important since new WormBase releases are scheduled every month. Since it is written as both a stand-alone and a server-based application, and can be run under Windows, Mac OS or Linux, we hope that it will be of widespread utility to laboratories dealing with lists of C. elegans
genes, as well as helping to improve the standard of consistent gene annotation.
Alternative methods to define up- or down-regulation
Most studies define up- or down-regulated genes on the basis of fold change in expression between two conditions. It remains a reasonable criterion for comprehensive comparisons of gene lists, and we therefore used it above to highlight commonalities and differences in the response of C. elegans
to different pathogens. The extended dynamic range of RNA-seq, however, potentially biases results based on fold change, as there is a greater relative variation in expression for lowly expressed genes 
. The fold change-based approach can thus yield a high proportion of false positive candidates in the area of low expression where the variability of expression is high for technical reasons.
Since RNA-seq gives transcript numbers as a read-out, one could alternatively calculate the absolute change in transcript numbers between two conditions (transcript number in condition A minus transcript number in condition B). But from a biological perspective, it is not at all obvious that change in absolute transcript number is a meaningful measure when looking across large sets of genes with an enormous range of expression levels. For example, one can imagine a case where there are 1000 transcripts of gene X under condition A, and 1100 under condition B; for gene Y there are 10 and 110, respectively. In both cases the change in absolute number is the same (100), but for gene X, the relative fold change is 1.1 and for gene Y it is ten-fold higher. One might expect the change in Y to have more biological impact.
Alternative methods have been developed but these all use either the Poisson distribution or Fisher's exact test to model data, neither of which cope well with biological variation 
We therefore experimented with two alternative approaches (see Methods
and Figure S3
for details). Both approaches identify differentially regulated genes on the basis of comparisons with genes of similar levels of expression under control conditions and additionally take into account variability at the lower end of the expression level in order to minimise false-positives. The two different approaches give overlapping but non-identical results (, Figure S4
). Inspection of the lists of genes that are identified using each method suggests that, depending on the purpose, one approach may be more appropriate. Thus, though the methods look promising, they clearly require further validation with data sets from a broad range of biological samples. Neither method, however, provides a solution for the correct quantitative analysis of transcripts with no detectable expression in one of the two conditions, as their ratio is either zero or infinity. These can, however, merit special attention. In our study, transcripts with no expression in the uninfected condition, but which are induced upon infection, could be considered the most infection-specific and the corresponding genes potentially interesting ones. We found 619 such genes (in WS210), including abf-4
, cnc-5, cnc-9
, ins-8, ins-32 and ins-39
and several nematode-specific peptides. Remarkably, only 4 of the 619 genes figure in the list of genes identified by oligo-array as being induced by D. coniospora
infection (data not shown). Methods based on differences in transcript counts would allow these genes to be identified, otherwise, another way to address this issue is to set the value for the condition with no expression to some arbitrary small value 
. This allows such transcripts to be included in the normal analysis scheme, but their calculated fold change will be entirely artificial. Clearly, other approaches to analyze data with linear values and zero expression in one condition are needed.
Selection of differentially-regulated transcripts.
This study has greatly expanded our understanding of the molecular repertoire of the C. elegans genes that are involved in the transcriptional response to infection. Extending previous observations, we have shown that not only is the host response composed of both pathogen-specific and pathogen-shared elements, but that the latter depends upon the site of infection. To enable this work, we needed to construct bioinformatic tools, which we expect to be of broad utility to the research community, especially as with the advent of improved sequencing methods, the predicted structures of genes are constantly evolving.