In this study, we used deep RNA sequencing to define and characterize the transcriptomes of L. monocytogenes strain 10403S and an otherwise isogenic ΔsigB mutant, which does not express the general stress-response sigma factor, σB. The data generated using this approach showed that (i) at least 83% of annotated L. monocytogenes genes are transcribed in stationary phase cells; and (ii) stationary phase L. monocytogenes transcribes 67 ncRNAs, including one σB-dependent ncRNA and seven ncRNAs that, to our knowledge, have not previously been identified in L. monocytogenes. Additionally, RNA-Seq data provided for quantitation of transcript levels and approximate identification of transcriptional start sites on a genome scale. Use of a novel, iterative, dynamic HMM, in combination with RNA-Seq data, identified putative σB-dependent promoters and further defined the L. monocytogenes σB regulon.
The majority of annotated L. monocytogenes genes are transcribed in stationary phase cells
While genome sequencing and microarray approaches have provided important insight into the biology of prokaryotic organisms, including a number of human bacterial pathogens, identification of all genes and their transcriptional patterns remains a major challenge in all areas of biology. Our results demonstrate that global probe-independent approaches for transcriptome characterization are valuable tools for analyzing bacterial transcriptomes [
16,
28,
29]. A major challenge that currently hinders analysis of transcriptomic data generated by approaches such as RNA-Seq is the ability to differentiate between genes with low levels of transcription and background levels of coverage. Several approaches have been used to define cut-off values between background GEI and GEI indicative of low transcript levels (e.g., [
15,
30,
31]). We chose a comparative analysis of
L. monocytogenes 10403S transcript levels with those of a mutant strain that does not express a transcription factor (i.e., the alternative sigma factor σ
B) as a novel approach for robustly defining background RNA-Seq coverage. Our results show that a number of σ
B-dependent genes were solely σ
B-dependent (at least under the conditions used here), as supported by the lack of detectable RNA-Seq coverage in the Δ
sigB strain, despite considerable RNA-Seq coverage of the same genes in the isogenic parent strain 10403S. This is an important observation as a number of σ
B-dependent
L. monocytogenes genes are also activated by other sigma factors (e.g., σ
A [
32,
33]). Using the average GEI for
L. monocytogenes genes that were solely σ
B-dependent in the Δ
sigB strain as a conservative cut-off value for transcribed genes, we found that approximately 83% of
L. monocytogenes 10403S annotated CDS were transcribed in stationary phase cells. These transcribed genes include 355 putative operons, which cover a total of 1,107 genes, indicating that a considerable proportion of
L. monocytogenes genes appear to be transcribed polycistronically. In comparison, a recent study using a tiling microarray identified 517 polycistronic operons that encompass 1,719 genes in
L. monocytogenes EGD-e [
20]. Taken together, these data indicate that the majority of annotated
L. monocytogenes genes are transcribed. This conclusion is consistent with results from a whole-genome tiled microarray transcriptome study of
E. coli MG1655 [
34], which reported transcription of 4052
E. coli MG1655 genes in bacteria grown under different conditions, suggesting that about 98% of the
E. coli MG1655 genes are transcribed.
Our results also demonstrate that RNA-Seq coverage levels (generated with the Illumina Genome Analyzer System) correlate well with quantitative RT-PCR-based mRNA transcript level data. Therefore, in combination with results from previous studies (e.g., in yeast [
15,
31], human cell lines [
35], human tissue [
36], murine tissue [
30]), our findings indicate that RNA-Seq tools can be broadly applied in biological studies to enable quantitative analysis of transcript levels. We also found a positive correlation between RNA-Seq-based transcript levels and codon bias, consistent with the well-documented observation that genes with high codon bias are often highly expressed [
37-
39]. Genes in four role categories, including (i) signal transduction, (ii) viral functions, (iii) amino acid biosynthesis, and (iv) transport and binding, were significantly associated with lower transcript levels. These categories include a number of genes that encode proteins predominantly required for growth and survival under specialized environmental conditions (e.g., viral replication genes) or under conditions other than stationary phase (e.g., amino acid biosynthesis may be less important in stationary phase than during exponential growth as sufficient amino acids from dead bacteria are likely to be available for scavenging), and/or proteins that may only be required in small amounts. On the other hand, we found that genes in seven role categories, including (i) cellular processes, (ii) DNA metabolism, (iii) protein fate, (iv) protein synthesis, (v) purines, pyrimidines, nucleosides, and nucleotides, (vi) transcription, and (vii) genes encoding proteins with unknown functions, showed, on average, higher transcript levels in stationary phase
L. monocytogenes. These findings suggest that genes in these particular categories are important for bacterial cells transitioning from exponential growth to stationary phase.
Overall, the
L. monocytogenes genes with the highest transcript levels were ncRNAs, specifically the transfer-messenger RNA (tmRNA) and 6S RNA, consistent with the observation that tmRNAs are involved with bacterial recovery from a variety of stresses including entry into stationary phase, amino acid starvation, and heat shock [
40]. 6S RNA accumulates in cells during stationary phase; cells lacking 6S RNA have reduced fitness relative to wildtype stationary phase cells [
41]. In addition to down-regulating some housekeeping genes, 6S RNA has been shown to up-regulate expression of some σ
S-dependent genes in Gram-negative bacteria [
41]. σ
S is the stationary phase stress response alternative sigma factor in
E. coli [
42]. Taken together, we hypothesize that 6S RNA plays a critical role in the ability of
L. monocytogenes to survive stationary phase associated stress conditions.
Specific protein-encoding genes with very high transcript levels in stationary phase
L. monocytogenes include
fri, sod, cspB, and
cspL, all genes with some previous evidence for contributions to
L. monocytogenes stationary phase and stress survival [
43-
49].
flaA, which encodes a flagellin protein, was also highly transcribed in stationary phase cells at 37°C. Although
L. monocytogenes has been reported to show flagellar motility only when grown at ≤ 30°C [
50,
51], our results are consistent with the observation that strain 10403S, which was used in this study, has been shown to express flagellin at 37°C [
51]. Interestingly, we also found some annotated CDS without known function to be highly transcribed, including lmo1847 and lmo1849, which encode putative ABC transporters based on BLAST and Pfam [
52] searches, respectively, and lmo1468, which encodes an unknown protein.
RNA-Seq identifies ncRNA molecules in L. monocytogenes, including a σB-dependent ncRNA, in 10403S
Using RNA-Seq, we found 67 previously identified or putative ncRNAs that were transcribed in stationary phase
L. monocytogenes. Of these, 7 represent ncRNAs that have not been identified previously as transcribed in
L. monocytogenes. Sixty of the ncRNAs identified here have previously been reported by Toledo-Arana et al. [
20], Nielsen et al. [
53], Mandin et al. [
22] and Christiansen et al. [
19]. Interestingly, 16
L. monocytogenes ncRNAs with similarities to ncRNAs identified in other bacterial organisms are putative riboswitches. We also found that
sbrE (
rli47), which has no homologies to ncRNA entries in Rfam, appears to be directly regulated by σ
B, based on the considerably higher transcript levels (186 fold) present in the parent strain as compared to the
sigB-null mutant, consistent with results from a recent tiling microarray study [
20]. As the RNA isolation procedure used here selected against small RNA molecules (see Materials and Methods for details), it is likely that additional small ncRNAs not detected here (e.g., some small ncRNAs identified by Toledo-Arana et al. [
20]), are also transcribed in stationary phase
L. monocytogenes 10403S.
Prior to this study,
L. monocytogenes ncRNAs, including potential σ
B-dependent ncRNAs [
53], had been identified using
in silico modeling [
22,
53], co-precipitation with the RNA-binding protein Hfq [
19], and, most recently, tiling microarrays [
20]. While, among these approaches, tiling microarrays [
20] provided the most comprehensive characterization of
L. monocytogenes ncRNAs, deep RNA sequencing also identified a large number of transcribed
L. monocytogenes ncRNAs, including ncRNAs with no similarities to previously identified ncRNAs. Our results, taken together with previous studies that have identified numerous novel transcripts with RNA-Seq in bacteria (
S. meliloti [
28],
B. cenocepacia [
16],
V. cholerae [
29]), yeast [
15,
31], mouse [
30], Arabidopsis [
54], human cell lines [
35,
55], and human tissue [
36], clearly show the power of this technique for characterizing bacterial transcriptomes and ncRNAs.
The L. monocytogenes σB regulon is composed of at least 96 genes, including 82 genes and 1 ncRNA that are preceded by putative σB promoters
As alternative sigma factors, such as σ
B, are known to play critical roles in gene regulation across bacterial genera [
33], we used
L. monocytogenes 10403S and an isogenic Δ
sigB null mutant as a model system for exploring the use of RNA-Seq, in combination with
in silico analyses, for characterization of transcriptional blueprints associated with bacterial regulatory elements. In our study, RNA-Seq identified 96 annotated CDS and one ncRNA SbrE (Rli47) that are up-regulated by σ
B. Quantitative RT-PCR experiments also confirmed σ
B-dependent transcript levels of SbrE (Rli47) (Mujahid et al., unpublished). Among the 96 σ
B-dependent annotated CDS identified in this study, 74 (77.1%) [
10] and 81 (84.4%) [
12] were also identified as σ
B-dependent in stationary phase cells in two previous microarray studies using the same strain background. Also, 63 of the 96 σ
B-dependent genes identified here were reported as positively regulated by σ
B in another
L. monocytogenes strain (EGD-e) grown to early stationary phase [
11]. Twelve genes were identified as σ
B-dependent in both previous microarray studies performed with the same
L. monocytogenes strain background and the same conditions used here, but were not identified as σ
B-dependent by RNA-Seq in this study. This disparity is likely due to the fact that the thresholds and statistical cut-offs used to define σ
B-dependent genes were very stringent in the present study (e.g., a
q-value < 0.05 in all four comparisons).
Overall, in addition to confirming a previously identified σ
B-dependent ncRNA [
20], RNA-Seq identified 13 genes that had not been defined as σ
B-dependent in previous microarray studies of stationary phase
L. monocytogenes 10403S cells [
10,
12], including 5 genes that had been identified as σ
B-dependent in salt stressed cells, but not in stationary phase cells. One gene not previously identified as σ
B-dependent was lmo2003, which encodes a transcription regulator similar to the GntR family. The GntR family of regulators has been characterized as global regulators of primary metabolism in a number of bacteria [
56-
58]. This finding further supports that
L. monocytogenes σ
B appears to be involved in a number of transcriptional regulatory networks [
6]. Increasing evidence indicates that regulatory RNAs also contribute to regulatory networks that involve
L. monocytogenes σ
B. For example, in addition to the σ
B-dependent SbrE ncRNA described here, tiling array analyses also identified additional σ
B-dependent ncRNAs. While previous
in silico studies in
L. monocytogenes strain EGD-e [
53] identified four putative σ
B-dependent ncRNAs (i.e., SbrA, SbrB, SbrC, SbrD), only SbrA was confirmed
in vivo as σ
B-dependent in EGD-e [
20,
53]. Even though our RNA-Seq analyses in 10403S identified SbrA transcripts, transcript levels for this ncRNA were not σ
B-dependent under the conditions used in our study. The fact that SbrA was not found to be σ
B-dependent in 10403S may be due to differences in strains or growth conditions used (e.g., Nielsen et al. [
53] and Toledo-Arana et al. [
20] used strain EGD-e, while we used strain 10403S). Further studies in different
L. monocytogenes strains will thus be needed to understand the full complexity of regulatory networks in this pathogen, including those involving σ
B and ncRNAs.
The quantitative nature of RNA-Seq allowed us to also identify highly transcribed σ
B-dependent genes, including lmo2158 (which encodes a protein similar to the
B. subtilis YwmG), lmo1602 (which encodes an unknown protein), and lmo0539 (which encodes a tagatose-1,6-diphosphate aldolase). Interestingly, none of these genes encode proteins that appear to contribute to any of the presently recognized σ
B-dependent phenotypes in
L. monocytogenes, such as acid resistance [
9,
59], oxidative stress resistance [
59,
60], or virulence [
27,
33,
61,
62]. As there are no published reports of construction and characterization of null mutations in these highly transcribed σ
B-dependent genes, our data clearly suggest that σ
B and the σ
B regulon make additional important contributions to
L. monocytogenes physiology that remain to be characterized.
In conjunction with appropriate bioinformatics tools, such as the iterative, dynamic HMM developed in this study to identify putative σ
B promoters, RNA-Seq data also allowed mapping of approximate transcriptional start and termination sites. Specifically, putative σ
B-dependent promoters were identified upstream of (i) 49 monocistronic σ
B-dependent genes, (ii) 15 σ
B-dependent operons (covering a total of 40 genes), and (iii) 1 σ
B-dependent ncRNA. By comparison, in the absence of genome wide transcriptional start site data, a previous study that solely relied on HMM and genome sequence data identified putative σ
B-dependent promoters upstream of only 40 genes that had been identified as σ
B-dependent by microarray analyses [
10]. Our data reported here show that the majority of σ
B-dependent genes are directly regulated by σ
B and illustrate the power of combining RNA-Seq data and bioinformatics approaches for characterizing transcriptional regulatory systems. Specifically, combining transcriptional start site information with an HMM that identifies promoter motifs (e.g., the motif for σ
B-dependent promoters) provides a powerful approach for identifying genes directly regulated by a given transcription factor. This approach facilitates rapid genome-wide identification of putative transcriptional start sites, which currently represents a critical bottleneck in genome-wide characterization of transcriptional regulation and regulatory networks, as many current strategies for promoter mapping (e.g., primer extension, rapid amplification of cDNA ends (RACE-PCR), RNAse protection assays) are time- and labor-intensive.