|Home | About | Journals | Submit | Contact Us | Français|
Post-transcriptional regulatory mechanisms are widespread in bacteria. Interestingly, current published data hint that some of these mechanisms may be non-random with respect to their phylogenetic distribution. Although small, trans-acting regulatory RNAs commonly occur in bacterial genomes, they have been better characterized in Gram-negative bacteria, leaving the impression that they may be less important for Firmicutes. It has been presumed that Gram-positive bacteria, in particular the Firmicutes, are likely to utilize cis-acting regulatory RNAs located within the 5′ mRNA leader region more often than trans-acting regulatory RNAs. In this analysis we catalog, by a deep sequencing-based approach, both classes of regulatory RNA candidates for Bacillus subtilis, the model microorganism for Firmicutes. We successfully recover most of the known small RNA regulators while also identifying a greater number of new candidate RNAs. We anticipate these data to be a broadly useful resource for analysis of post-transcriptional regulatory strategies in B. subtilis and other Firmicutes.
A variety of RNA-based regulatory mechanisms have been shown to be important in controlling expression of metabolic pathways, stress responses, developmental processes and pathogenesis. Depending on their structural relationship to the target gene, regulatory RNAs can be divided into two broad categories: those that are co-transcribed (in cis) or transcribed independently (in trans) with respect to the target mRNA. Cis-acting regulatory RNAs are typically located within the 5′ leader region of the target transcript, although they can also be situated within inter-cistronic regions of multi-gene transcripts. Trans-acting regulatory RNAs, which can be encoded either in cis or in trans relative to their target transcript, are transcribed separately from the mRNA target. Both classes of regulatory RNAs are capable of activation or repression of gene expression.
In the past several decades, many small (~70–200 nt) RNAs (‘sRNA’) have been discovered in bacteria (1,2). Overlapping methods have been used for their discovery, including directed cloning and sequencing of sRNA pools, computational approaches, deep sequencing and hybridization of sRNA pools to genomic tiling arrays (3–8). Although a few RNAs serve ‘housekeeping’ functions, such as processing of pre-tRNAs by RNase P, most sRNA candidates identified by these approaches are presumed to function as regulatory agents. For example, one widespread sRNA class regulates gene expression by sequestering an RNA-binding protein (e.g. CsrA) and preventing it from controlling translation of target mRNAs (9). Another widespread class, coined 6S RNAs, regulates expression patterns by structurally mimicking an open promoter and associating with RNA polymerase to prevent transcription initiation of target genes (10). However, most sRNAs are likely to affect gene expression by directly base pairing to one or more target mRNAs (2,11). Some interact with their target mRNA via antisense base paired regions. Typically, these antisense sRNAs associate with the target transcript through formation of long base paired regions of >65 nt in order to regulate mRNA stability, translation or transcription elongation (2). In contrast, many trans-encoded sRNAs associate with their target transcripts through shorter, more imperfect, base pairing interactions (12). Dozens of such sRNAs have been characterized in Escherichia coli and other Gram-negative bacteria. Indeed it has been estimated that 200–300 sRNAs will be present in the average bacterial genome, equivalent in numbers to the total complement of cellular transcription factors (13). Most are individually responsive to different stress conditions; they largely assist the adaptive responses of the microorganism as its local environment undergoes changes. Similar to transcription factors, many sRNA regulators are predicted to regulate multiple targets (14–17); therefore, the full range of their regulatory complexities remains to be determined. Many sRNAs bind near to the start of the coding region in order to affect translation initiation efficiency of the target gene (12,18); however, some reduce translation by interacting further upstream within the 5′ untranslated region (5′ UTR) (15,19,20). Yet others reduce expression by associating with the mRNA further downstream, within the coding region (21).
The majority of these studies have occurred with E. coli and other proteobacterial species and it is unclear how meaningful they are for other bacteria. To that end, recent efforts have begun to uncover candidate sRNAs in non-proteobacterial species. Recently, sRNAs have been identified in Gram-positive bacteria, including B. anthracis, Listeria monocytogenes, Staphylococcus aureus, Streptococcus species and Streptomyces coelicolor (22–32). However, the common ‘rules’ for sRNA regulation, as well as their involvement with RNA-binding proteins, are not yet apparent from these analyses (32,33). Historically, B. subtilis has been used as the benchmark model microorganism for Gram-positive bacteria. Yet study of sRNA regulation in this organism is still in an early stage. Therefore, rigorous analysis of the importance and molecular mechanisms for sRNA regulation in B. subtilis is important for elucidation of post-transcriptional regulation in Firmicutes and other Gram-positive bacteria.
Some forms of RNA-mediated regulation have already been demonstrated to be significantly important for B. subtilis. For example, at least 4% of the genome is believed to be subject to control by cis-acting regulatory RNAs alone (34) and is still certain to underestimate the full degree of RNA-mediated regulation. Indeed, little is known about the importance or mechanisms of trans-acting sRNAs, although several individual examples have been identified. A recent publication describing identification of B. subtilis sRNAs via mapping of transcriptionally active regions by high-density oligonucleotide tiling arrays successfully identified most known sRNAs, as well other new candidates (35). Herein, we describe the discovery of sRNAs and other putative regulatory RNA elements using a sister technique to the latter study. Specifically, we utilized a differential RNA-sequencing (dRNA-seq) approach that is selective for transcriptional start sites (TSS) (36). Our approach successfully recovers most known regulatory RNAs as well as a portion of the microarray-predicted sRNAs, also identifying many unique candidates. This catalog of candidate regulatory RNA elements will serve as important reference point for comprehensive analyses of sRNA regulation in B. subtilis and other Bacillus species.
Bacillus subtilis strain 168 and NCIB 3610 were used for 454 pyrosequencing and northern blot analyses, respectively. For total RNA extractions, cells were grown overnight until reaching stationary phase (~20 h) at 37°C in modified glucose minimal medium [(NH4)2SO4 20 g/l, K2 3H2O 183 g/l, KH2PO4 60 g/l, .7H2O 2 g/l, sodium citrate 10 g/l, 0.5% glucose, 0.5 mM CaCl2, 5 µM MnCl2]. For strain 168, tryptophan was added to a final concentration of 50 µg/ml.
Total RNA was harvested from B. subtilis strains cultured at 37°C as described (37). Briefly, these cells were pelleted, re-suspended in LETS buffer (0.1M LiCl, 10mM EDTA, 10mM Tris–HCl, 1% SDS), vortexed with acid-washed glass beads (Sigma-Aldrich) for 4 min, and incubated at 55°C for 5 min. The resulting solution was subjected to extraction by TRIzol reagent (Invitrogen) according to the manufacturer instructions. For subsequent pyrosequencing analysis, total RNA preparations were subjected to RNase-free DNase I (Roche) digestion for 30 min at 37°C in the presence of 0.25 mM MgCl2. RNA was then extracted with phenol:isoamyl:chloroform, ethanol precipitated and re-suspended in water. RNA quality was assessed by resolving samples on 1% formaldehyde–agarose gels and quantified via absorbance spectroscopy.
Total RNA samples (15–20 µg) were heated at 65°C for 10 min in 1 × gel loading buffer (45 mM Tris–borate, 4 M urea, 10% sucrose [w/v], 5 mM EDTA, 0.05% SDS, 0.025% xylene cyanol FF, 0.025% bromophenol blue) and resolved by 6% denaturing (8 M urea) polyacrylamide electrophoresis. RNAs were transferred to BrightStar-Plus nylon membranes (Ambion) using a semi-dry electroblotting apparatus (Owl Scientific) according to manufacturer instructions. The blots were UV-crosslinked and hybridized overnight at 42°C in UltraHyb-Oligo buffer (Ambion) with the appropriate 5′-radiolabeled (32P) DNA oligonucleotide (see Supplementary Table S4 for oligonucleotides used in this study). The blots were then washed 2× for 15 min using low stringency wash buffer (1 × SSC, 0.1% SDS, 1 mM EDTA). Radioactive bands were visualized using ImageQuant software and a Typhoon PhosphorImager (Molecular Dynamics).
cDNA cloning and pyrosequencing was performed as described (38) except without size fractionation of RNA. However, the total RNA was split into two samples and for one sample primary transcripts containing 5′-PPP were enriched by treatment of the total RNA samples with Terminator 5′-phosphate-dependent exonuclease (Epicentre). Upon treatment of both samples with tobacco acid pyrophosphatase to generate 5′-monophosphates for linker ligation, RNA samples were poly(A)-tailed using poly(A) polymerase followed by ligation of a RNA adapter to the 5′-phosphate of the small RNAs. First strand cDNA synthesis was then performed using an oligo(dT)-adapter primer and M-MLV H-reverse transcriptase. A detailed protocol for the enrichment, cDNA library generation and subsequent sequencing steps are described (36). cDNA libraries were sequenced on a Roche FLX sequencer at the M.D. Anderson Cancer Center DNA Analysis Core Facility. The 5′-linker and poly-A tail removal were performed using custom-made Python scripts. The resulting cDNA sequences were then mapped onto B. subtilis genome (NC_000964.3) using ‘segemehl’ software (39). Mapped reads were visualized using Integrated Genome Browser (IGB).
Bacillus subtilis strain 168 cells were cultured in glucose minimal medium until stationary phase, whereupon total RNA was extracted using standard methods. Most bacterial RNAs, including both mRNAs and sRNAs, contain a triphosphate moiety at their 5′ terminus, whereas processed transcripts, such as rRNAs and tRNAs, contain a 5′ monophosphate. To specifically enrich our samples for primary transcripts, half of the sample was treated with terminator exonuclease that preferentially degrades 5′-monophosphorylated RNAs. cDNA libraries were then prepared from these two pools (‘unenriched’ and ‘enriched’) and analyzed by 454 pyrosequencing. After 5′-linker and poly-A tail removal, a total of 406 531 cDNA reads (>15 nt in length) could be successfully mapped to the B. subtilis genome. Most of these sequences corresponded to ribosomal RNAs and tRNAs, which were excluded from further analysis. At the end, 25 675 and 44 098 cDNA reads were obtained for the unenriched and enriched libraries, respectively, and visualized with IGB (Supplementary Table S1; IGB, Affymetrix). The majority mapped to intergenic regions, due to the enrichment for cDNA reads located proximal to the TSS (36). Based on these data we were able to identify ~600 potential TSSs in the B. subtilis genome, which appeared to be modestly increased nearer to the origin of replication (Supplementary Figure S1). From this analysis, classes of TSS could be identified for both sense and antisense RNAs (Supplementary Figure S2). Also, signals that were likely to correspond to long 5′ mRNA leader regions could be identified for some genes, whereas other cDNA reads appeared likely to correspond to small regulatory RNAs (sRNAs) (Supplementary Figure S2). We concentrated our efforts on the latter classes, which are most likely to include cis- and trans-acting regulatory RNAs.
Bacillus subtilis employs a wide variety of cis-acting regulatory RNA elements (34). In contrast to the average length of ~360 nt (±150 nt) for 5′ leader regions of transcripts including cis-acting regulatory RNAs, the average overall leader length based on the TSS map peaks around 35 nt (Figure 1) (34). This observation suggests that ‘long’ 5′ leader regions are likely to occur only when specialized functions are encrypted within them. One possible function of a long 5′ leader region is to incorporate structural elements that affect the stability of the overall transcript. Alternatively, some long 5′ leader regions include within them sequence and structural components that help guide intracellular mRNA localization. However, the most likely explanation for a long 5′ mRNA leader region is due to inclusion of a cis-acting, signal-responsive regulatory RNA. Therefore, unbiased experimental methods capable of identifying long 5′ leader regions, such as high-throughput sequencing of TSS, offer a potentially powerful approach for discovery of new regulatory RNA elements. Current bioinformatics-based approaches are likely to include bias for phylogenetically widespread and highly conserved regulatory RNAs. In contrast, unbiased mapping of TSSs is expected to uncover 5′ mRNA leader regions without regard for phylogenetic distribution, even when they include poorly conserved, recently evolved, or highly degenerate cis-acting regulatory sequences.
To that end, we investigated whether 454 pyrosequencing of B. subtilis stationary phase RNAs was capable of identifying previously established long 5′ leader regions. Previous data have established a minimum of 24 protein-responsive cis-acting regulatory RNAs, 19 tRNA-responsive cis-acting regulatory RNAs, 32 metabolite-responsive cis-acting regulatory RNAs and one metal-sensing regulatory RNA (34,37). Long 5′ leader regions were correctly identified in our data set for 74, 67 and 62% of tRNA-, metabolite- and protein-responsive regulatory RNAs, respectively (Supplementary Table S2A). Moreover, a qualitative assessment of putative start sites determined in our data set matched on average within 1 nt of the previously established start sites (as cataloged by DBTBS; 40). Of the ~600 putative TSSs identified herein, 93 were located at least 100 nt away from the downstream gene and did not already correspond to a known long leader region (Supplementary Table S1). There are bound to be false positives within this data set, i.e. start sites that do not correspond to synthesis of a long UTR. For example, it is possible that transcription could initiate upstream of a gene for synthesis of a separate, unique sRNA gene, having nothing to do with expression of the downstream gene. Therefore, as a conservative strategy for specifically identifying long leader regions, we assessed most closely those cDNA reads that start within an intergenic region but that cumulatively overlap with the downstream coding sequence (or that end within 10 nt of the downstream gene). We assumed that this arrangement would result in the highest confidence for assigning leader regions. A total of 40 examples fit this description (Table 1; Supplementary Table S3). The fact that the cDNA signals for the remaining 53 stop upstream of the downstream coding region does not automatically eliminate them as corresponding to 5′ leader regions. Indeed, many previously established cis-acting regulatory RNAs resemble the latter pattern, presumably due to the presence of an intrinsic terminator element before the coding region. Also, several transcripts that have been demonstrated to contain long 5′ leader regions of unknown function appear in our data (e.g. srfAA, yxbB), although their cDNA reads do not fully continue into the downstream gene.
Interestingly, several transcripts with newly identified long leader regions can be grouped relative to their expected functional roles. For example, five such transcripts contained moderately ‘long’ 5′ leader regions (104, 104, 119, 121 and 143 nt) upstream of genes encoding ribosomal protein homologues. Post-initiation regulation of ribosomal protein genes is common in bacteria. Oftentimes, ribosomal proteins (r-proteins) bind to structural motifs located within the leader region to control expression of the downstream genes, in order to coordinate their overall stoichiometry with other r-proteins (41–44). It is possible that the moderately long 5′ leader regions identified herein are required for similar regulatory mechanisms. Certain RNase enzymes have also been demonstrated to post-transcriptionally autoregulate their expression by interacting within their 5′ leader region (45). Therefore, the fact that the recently identified RNase Y (ymdA) gene appears to be preceded by a 164 nt leader region could be suggestive of a similar mechanism. Also, transcripts encoding for certain core transcription elongation subunits (rpoB, greA, nusA—located in an operon with ylxS) also appear from our data to contain a long 5′ leader region, suggesting they also may be subjected to post-initiation control.
Another functionally related group of transcripts within this list encode for central metabolism genes. For example, the 5′ leader regions for certain tricarboxylic acid cycle transcripts, including pyruvate dehyrogenase (pdhA), citrate synthase (citZ) and succinate dehydrogenase (odhA), appear to be 212, 199 and 100 nt in length. Similarly, the 5′ leader region for an oxidative phosphorylation gene, menaquinol oxidase (qoxA), is 248 nt in length. A single glycolysis-related transcript, which encodes for fructose 1,6 bisphosphate aldolase (fbaA), also contains a long 5′ leader region (111 nt). It remains to be determined whether there are any common sequence or structural features between the 5′ leader portions of these central metabolism transcripts. Further experimentation will be required to assess whether the new 5′ leader regions identified in this study contain within them elements that are important for post-transcriptional regulation of their associated genes.
One of our primary motivations for performing the experimentation described herein was to validate previously established sRNAs and, more preferably, to discover new examples. There have been 14 non-housekeeping sRNAs identified previously in B. subtilis, although only a few have been studied in detail. Two have been identified as 6S RNAs (6S-1 and 6S-2; 46,47). Other studies have predicted a small suite of candidates, some of which may be under control of sporulation-specific sigma factors (35,48,49). Of these candidates, mRNA targets have been experimentally identified for only a few. For example, two antisense RNAs have been demonstrated to regulate a toxin gene (txpA) and an unknown gene (yabE), respectively (50,51). One sRNA, SR1, controls expression of a transcriptional activator of arginine catabolism, AhrC (52), while another, FsrA, controls iron-responsive genes (sdhCAB, citB, yvfW, leuCD) (53). Recent discoveries in S. aureus revealed a sRNA (RsaE) that is widely conserved amongst Gram-positive species, including B. subtilis. It appears to target central metabolism genes and cstA, which encodes for a ‘carbon starvation’ gene (27). A more recent investigation, which examined the global transcriptional profile of B. subtilis by high-density oligonucleotide tiling arrays, resulted in identification of 54 new sRNA candidates (35). This raised the total number of sRNAs proposed for B. subtilis to ~70. To find novel sRNA candidates using 454 pyrosequencing of stationary phase RNAs, we searched for cDNA peaks that occurred specifically and entirely within intergenic regions, and which oftentimes included an identifiable intrinsic transcription terminator at the 3′ terminus. Of the 14 previously identified sRNAs, our analysis recovered seven: 6S-1, 6S-2, fsrA, bsrE, bsrF (SR2), bsrG and bsrH (Tables 2 and and3,3, Supplementary Table S2B; 46,47,49,53). Additionally, pRNA, a small RNA oligonucleotide that is synthesized by RNA polymerase using 6S as a template, could be detected for 6S-1 but not 6S-2 (Figure 2). The complicated relationship between the 6S and pRNA expression profiles will be addressed more fully in a different, future publication (R. Hartmann, personal communication). SurA, another previously identified sRNA, appeared from our analysis to be an antisense RNA since it appeared to overlap with the adjacent yndL gene. A putative sRNA was also previously identified within the polC-ylxS locus. This particular candidate was not specifically found within our analysis; instead, our data exhibited cDNA reads at the same locus but that appeared to correspond to a long 5′ leader region for the ylxS gene. The remaining previously identified sRNA candidates (bsrC, bsrD, bsrI, SR1, SurC) couldn’t be detected in our data set, potentially due to limited expression during our growth conditions. SurC, for example, is exclusively expressed during sporulation (48). Also, from the 54 putative sRNAs identified by Rasmussen et al. (35), we detected 11 (20%) total. Finally, our analysis detected 50 new unique sRNA candidates (Tables 2 and and3).3). We did not specifically investigate whether these RNAs exhibited putative open reading frames; therefore, we cannot exclude that a subset might encode for small peptides.
An interesting feature of sRNAs in Gram-negative bacteria is their phylogenetic distribution. For example, it is not uncommon to find sRNAs that are well conserved among the γ-proteobacterial species. It is not yet clear why these sRNAs have not evolved more rapidly among these organisms but is generally assumed that the primary sequence and secondary structure conservation for certain sRNAs has been retained to maintain intermolecular interactions with a common mRNA target. However, it is also possible that certain sRNAs exhibit phylogenetic conservation because they are constructed from exceptionally successful structural scaffolds, which are optimized for both interactions with target mRNAs and protection against RNases. Of the sRNA candidates identified in this study, most can be identified only in B. subtilis or the most closely related Bacillus species that have been sequenced. However, a few B. subtilis sRNA candidates also appeared to be present in genome sequences of other Bacilluls species (Tables 2 and and3).3). Overall, this suggests that the B. subtilis sRNAs are likely to be more limited in their phylogenetic distribution than their proteobacterial counterparts.
Most striking in its phylogenetic distribution is RsaE, which has been identified in two prior studies (27,35) and can be found in diverse Gram-positive bacteria, including Staphylococcus, Lysinibacillus, Geobacillus, Listeria and Bacillus species. In B. subtilis, the top mRNA candidate for interaction with RsaE is cstA, which encodes for an uncharacterized carbon homeostasis protein (27,35). However, this gene does not appear to be a target in Staphylococcus species. Therefore, it is still unclear why this particular sRNA exhibits such high, albeit lineage-sporadic, distribution.
Several sRNA candidates that were identified herein but that were also discovered by prior studies (bsrE, bsrH, ncr39, ncr10 and ncr60) can also be found within the genomes of other Firmicutes, most often for Bacillus species (Tables 2 and and3).3). Comparative sequence alignments of these sRNA genes reveals several instances of covarying residues within putative helices, which together predict the occurrence of secondary structure features common for each sRNA class (data not shown). Additionally, a few novel sRNAs discovered by our current analysis appear to be conserved amongst genomes of a few other Bacillus species. For example, ncr1015 can be identified in the genomes of B. subtilis, B. amyloliquefaciens, B. licheniformis, Brevibacillus species and Paenibacillus species. Similarly, ncr2637 can be found in Anoxybacillus flavithermus, B. subtilis, B. amyloliquefaciens, B. licheniformis and B. pumilus. It is not yet obvious why these particular sRNA candidates are conserved in these other organisms, although a common mRNA target would be the primary assumption.
These data together help create an inventory of sRNA candidates in B. subtilis. However, demonstrating they are functionally required for genetic regulation is a challenging endeavor. Three experimental methods are traditionally used to add more confidence in individual sRNA candidates: (i) independent detection by alternative experimentation (e.g. by northern blot analysis), (ii) demonstration of a reliance upon Hfq for stability and (iii) prediction and validation of mRNA targets. The role(s) of Hfq in Gram-positive bacteria is still poorly defined; therefore, this was not taken into account in the current study. Instead, we chose 11 of the longest and most highly expressed sRNA candidates for validation by northern blot analyses. All of the sRNA candidates that were chosen for northern analysis could be successfully detected (Figures 3, 4 and and6).6). Also, several appeared to be subjected to intracellular processing, given that they corresponded to lengths shorter than their predicted size (Figures 3 and and4).4). Other sRNA candidates were not assessed thusly as they appeared to exhibit lowered expression levels that are likely to be within the range of detection by deep sequencing methodology but not by northern blot analyses. Therefore, although much more experimentation is still yet required, preliminary experimentation on a subset of the candidate sRNAs appeared to validate their intracellular presence.
In order to begin assessing putative mRNA targets for these sRNA motifs we subjected the sRNA candidates to analysis by TargetRNA, a program designed to search for interrupted base pairing interactions within intergenic regions (54). As it is difficult to differentiate false positives from actual mRNA targets using this software alone, we interpret these predictions with caution. Only a subset of the target predictions, which exhibited particularly low estimated P-values, is highlighted in Figures 3 and and4.4. One possible explanation for the lack of mRNA targets for certain sRNA candidates is that the latter may target portions of protein coding sequences to affect mRNA stability or translation (21), an interaction that is not addressed by current prediction software. Additional experimentation will be required in order to determine whether these and other mRNAs represent actual targets for the sRNA candidates newly identified herein.
The genome of B. subtilis contains several prophages (SPβ, skin, PBSX) and prophage-like (pro1-pro7) regions, which are typically characterized by higher-than-background A + T nucleotide composition (55–58). Prophages, like plasmids, conjugative transposons and introns, are mobile elements that can be transferred horizontally, occasionally causing genomic rearrangements in bacteria. These elements often carry beneficial traits, such as antibiotic resistance cassettes or virulence factors that could help the host adapt to their environment.
From our analysis, we detected 16 putative sRNAs originating mainly from the SPβ, skin, pro6 and pro7 loci (Figure 5). Some of these sRNA candidates, in fact, were the highest expressed sRNAs in our data set (data not shown). None of the putative sRNAs described herein (Tables 2 and and3)3) were identified within the PBSX or pro1-pro5 regions. It is generally assumed that a subset of phage genes expressed during lysogenic phase either confer a particular selective advantage for the host or are important for maintaining the phage-host equilibrium. We predict that some of the sRNAs identified herein are likely to perform similar functions. However, it remains to be determined whether these sRNAs target genes within the phage loci or specific host genes, although there is precedence for both scenarios in other prokaryotes as well as eukaryotes (59–63).
Interestingly, six of the sRNA candidates within phage-like regions are predicted to interact in pairs through antisense interactions. These three sRNA pairs are co-organized within intergenic regions in distinct tail-to-tail arrangements; their 3′ terminal ~100 nt overlap and therefore are predicted to interact via antisense pairings (Figure 6). Several of the sRNAs that we predict to be organized in this manner have been identified previously, although their corresponding antisense partners were not (49). Specifically, our data suggest that sRNA candidates bsrE, bsrG and bsrH, which were identified previously, pair through intermolecular antisense interactions with newly identified ncr1019, ncr1058 and ncr1155, respectively. For convenience, we refer to these various sRNAs as bsrE, bsrG, bsrH, as-bsrE, as-bsrG and as-bsrH in order to denote their antisense pairings. The three pairs of these RNA molecules are located within different prophage or prophage-like regions: bsrE/as-bsrE in pro6, bsrG/as-bsrG in SPβ and bsrH/as-bsrH in skin. In addition, we also noticed that one of the pairs (bsrH/as-bsrH) is situated adjacent to a previously established toxin–antitoxin (TA) system, txpA/ratA (Figure 6; 50). Of particular note is that the txpA/ratA TA system shares a similar overall arrangement with the newly identified antisense-based RNA pairs (Figure 6). Most TA modules consist of two components: a stable toxin and a labile antitoxin. The txpA/ratA system represents a typical type I TA system that includes an mRNA encoding for a short, toxic peptide (TxpA) and an antitoxin that is comprised of an antisense RNA (RatA). In contrast, type II TA systems rely on a protein factor as the antitoxin (64–66).
Based on these observations we examined each of the six sRNA candidates for small open reading frames. Interestingly, only one sRNA from each pairings exhibited the potential to encode for a peptide of ~30 amino acids and included an appropriately spaced ribosome binding site (as-bsrE, bsrG, bsrH; Figure 6). All three peptides are predicted to contain a single α-helical transmembrane domain of ~20 amino acids with several additional charged residues at the C-terminus. This arrangement is consistent with type I toxins (Figure 6; 66). TxpA encodes a similar peptide but with a modestly longer C-terminus. The remaining sRNAs (bsrE, as-bsrG, as-bsrH) did not exhibit any similar peptide-encoding potential, consistent with a potential role as an antitoxin. However these latter RNAs shared some primary sequence features and a common overall secondary structure arrangement, which consists of four stem-loop regions (Figure 6). Approximately 100 nt, located between two of the helical elements (P2 and P4) exhibited base pairing potential to the 3′-end of the respective peptide-encoding mRNAs. RatA appears to have a similar secondary structure arrangement but with a longer 5′-portion (data not shown). The molecular mechanisms for how these antisense RNAs control toxin expression are still unclear. However, it has been proposed previously (50) that extensive 3′-end base pairing could promote simultaneous degradation of both RNAs.
Type I TA systems were originally discovered as an important component of plasmid maintenance mechanisms in E. coli (67). More recently they have also been discovered in many bacterial chromosomes (66,68–70). It has been theorized that coupling TA systems to control of plasmid replication would ensure that plasmid-free cells are killed by toxin accumulation, a phenomenon termed ‘post-segregational killing’ (71). Similarly, the txpA/ratA antisense module was proposed to be important for ensuring propagation of an accompanying phage genome in host cells (50). Our analysis therefore uncovers three more potential TA systems that are distributed in different prophage regions, suggesting that RNA-based TA mechanisms could be more common than previously recognized. Interestingly, we also note that two of the toxin-encoding mRNAs (as-bsrE and bsrG) are predicted to contain a ResD binding site within their putative promoter regions, suggesting a potential linkage between toxin expression and oxygen limitation (data not shown; 72).
In addition to the newly identified TA systems described above, our analysis also revealed other putative antisense RNAs (asRNAs). These different asRNAs could be assigned to several different categories based upon the arrangement with their target mRNA. A subset of the asRNAs exhibited the potential to fully pair with the entire target mRNA, while others pair with only a portion of the target mRNA through head-to-head (5′-overlap) or tail-to-tail (3′-overlap) interactions. In total, 29 candidate asRNAs were identified in our analysis (Table 4; Supplementary Table S2B). Two of them (ratA and as-yabE) were previously shown to regulate expression of target mRNAs (the toxin-encoding txpA mRNA and yabE, an mRNA encoding for a cell-wall binding protein, respectively) (Supplementary Table S2B; 50,51). SurA, which overlaps a σK-regulated transcript, yndL, has been shown to accumulate during sporulation but its regulatory capabilities remain to be demonstrated (48). Ten of the asRNA candidates discovered herein were also identified via the high-density tiling array analysis (35). These include asRNAs for the major vegetative growth phase sigma factor sigA, the teichoic acid biosynthesic enzyme, ggaA, a leucine biosynthetic enzyme, leuA, a choline transporter, opuBD and a cryptic SpoIIIJ-associated protein, jag. The remaining asRNAs identified herein are novel and are predicted to pair with a variety of characterized and uncharacterized genes (Table 4).
One potentially interesting asRNA candidate is ncr1430, which overlaps with the 5′leader region of bglP (Table 4; Figure 7). bglP encodes for a sugar phosphotransferase system (PTS) component that is involved in the utilization of β-glucosides, such as arbutin and salicin. It is transcribed as an operon with bglH, which encodes for the enzyme to metabolize the sugars (73). The full synthesis of the bglPH operon had been shown to be under regulatory control by CcpA-mediated catabolite repression and a transcription attenuation mechanism mediated by the RNA antiterminator (RAT) element located within the 5′ leader region (74). Under inducing conditions and in the absence of glucose, an antiterminator protein, LicT, binds to the RAT element to stabilize formation of an antiterminator structure, thereby allowing transcription to proceed to the downstream coding region (75). Based upon its location, ncr1430 is predicted to base pair with the bglPH mRNA within the region between the RAT element and the downstream bglP open reading frame, perhaps to repress translation. We hypothesize that this arrangement could provide yet a second layer of post-transcriptional regulation to allow finer control over the level of proteins needed to import and process the appropriate sugar. If true, these data highlight a unique example of regulation of a single gene by a transcriptional mechanism as well as by both cis- and trans-acting regulatory RNAs.
Bacillus species, which are typically motile, aerobic endospore-forming microorganisms, have been isolated from diverse locales including soil, water sources and plant root systems (76). Contributing to its adaptive abilities, B. subtilis is capable of differentiating from dividing cells to metabolically inactive spores, which are resistant to many chemicals, irradiation and desiccation. As mutually exclusive alternative lifestyles, B. subtilis can also initiate developmental pathways that culminate with cells that are competent (capable of DNA uptake and primed for homologous recombination), or function as dedicated producers of biofilm extracellular matrix constituents (77,78). Indeed, B. subtilis can form a multicellular community, consisting of spatially and temporally located cellular subtypes (77–79). Responsible in part for these biological properties are many transcription factors and a suite of alternative sigma factors that regulate transcription of the developmental pathway genes. However, given their general importance in Gram-negative bacteria, it stands to reason that RNA-based regulatory strategies are also likely to be important for coordination of multicellular behaviors and developmental pathways.
Collectively, our results argued for broader roles for small RNA regulators in B. subtilis. In combination with other data, we increase the number of potential small RNAs in B. subtilis to upwards of 100 candidates. The next step will be to identify the biological functions of these RNAs. We hypothesize that due to the differences in several key proteins involved in RNA metabolism between Gram-positive and Gram-negative bacteria (e.g. ribonucleases, Hfq and Rho), future studies of these regulatory RNAs will reveal novel mechanisms and add to the repertoire of bacterial RNA-based genetic regulatory strategies.
Supplementary Data are available at NAR Online.
The University of Texas Southwestern Medical Center Endowed Scholars Program; The Searle Scholars Program; National Institutes of Health (GM081882); The Welch Foundation (I-1643). Sara and Frank McKnight fellowship (to I.I.). Funding for open access charge: Searle Scholars Program UT Southwestern Medical Center Endowed Scholars Program Welch Foundation (I-1643) National Institutes of Health (GM081882).
Conflict of interest statement. None declared.
We are grateful to the members of the Winkler lab for helpful discussions.