|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: KAM JTW LMC. Performed the experiments: CHT CB LMC. Analyzed the data: CHT CB JTW LMC. Contributed reagents/materials/analysis tools: JL KAM JTW LMC. Wrote the paper: CHT JTW LMC.
Bacterial small RNAs (sRNAs) are short transcripts that typically do not encode proteins and often act as regulators of gene expression through a variety of mechanisms. Regulatory sRNAs have been identified in many species, including Mycobacterium tuberculosis, the causative agent of tuberculosis. Here, we use a computational algorithm to predict sRNA candidates in the mycobacterial species M. smegmatis and M. bovis BCG and confirmed the expression of many sRNAs using Northern blotting. Thus, we have identified 17 and 23 novel sRNAs in M. smegmatis and M. bovis BCG, respectively. We have also applied a high-throughput technique (Deep-RACE) to map the 5′ and 3′ ends of many of these sRNAs and identified potential regulators of sRNAs by analysis of existing ChIP-seq datasets. The sRNAs identified in this work likely contribute to the unique biology of mycobacteria.
The genus Mycobacterium contains many clinically relevant pathogens, including Mycobacterium tuberculosis and Mycobacterium leprae, the etiologic agents of tuberculosis (TB) and leprosy, respectively. M. tuberculosis alone was responsible for 8.7 million incident cases and 1.4 million deaths globally in 2011 . The treatment of TB has become increasingly difficult due to its high drug resistance and adaptability; hence, the development of new and more effective treatments for TB is imperative.
Bacterial “small RNAs” (sRNAs) are small (50–400 nt), typically untranslated transcripts. Many sRNAs play important roles in gene regulation in response to environmental changes . sRNAs can originate from their own independent genes or through the processing of larger transcripts . To exert their function, sRNAs typically base-pair with target messenger RNAs (mRNAs), resulting in altered transcription, mRNA stability, or translation . sRNAs are key regulators of pathogenesis in many bacterial species . Recently, RNA-seq has been widely applied to identify novel sRNA candidates in many bacterial species –, including M. tuberculosis . sRNAs have also been identified/predicted in M. tuberculosis using other experimental approaches – and computational analysis of DNA sequence , , . In total, 63 sRNAs have been experimentally validated in M. tuberculosis. sRNAs have also been identified in other mycobacterial species: 34 and 15 sRNAs have been experimentally validated in Mycobcaterium bovis BCG and Mycobacterium smegmatis, respectively .
In a previous study, we used computational predictions from the SIPHT (sRNA Identification Protocol using High-throughput Technologies)  algorithm to identify 144 sRNA candidates in M. bovis BCG. We selected 34 conserved sRNA candidates which we experimentally confirmed by Northern blot . In the current study, we expanded our search to include all the remaining SIPHT predictions for M. bovis BCG as well as all SIPHT predictions for M. smegmatis (these were not explored in the previous study). By combining SIPHT predictions with large-scale Northern blot validation, we have identified 23 and 17 novel sRNAs in M. bovis BCG and M. smegmatis, respectively. Thus, we have substantially increased the number of experimentally validated sRNAs in mycobacterial species. We also analyzed existing ChIP-seq datasets to identify possible regulators of sRNA expression, and used Deep-RACE, a technique that combines high throughput RNA-seq with Rapid Amplification of cDNA Reads (RACE) , to identify sRNA 5′ and 3′ ends. Lastly, it is worth noting that this work is one of the first efforts to better coordinate genome annotation; all sRNA candidates identified in this study have been renamed with a new nomenclature .
M. bovis BCG (Pasteur strain, Trudeau Institute), and M. tuberculosis H37Rv were grown in mycomedium (as previously reported, ). M. bovis BCG and M. tuberculosis cultures were grown for 7 days, with shaking, to late-log phase. Cultures of M. smegmatis MC2155 were grown shaking at 37°C, in trypticase soy media supplemented with 0.05% Tween 80 for 18 hours with shaking (late-log phase).
Small RNA candidates of M. smegmatis were predicted using the SIPHT program with the same parameters as described previously , . SIPHT identifies potential sRNA candidates based on the presence of intergenic sequence conservation upstream of putative Rho-independent terminators. SIPHT has been widely applied in sRNA studies –, and its reliability has been tested and compared with other algorithms .
RNA was isolated as previously reported . Northern blot analysis was performed as previously reported ; probes were designed according to SIPHT predicted sequences and tested in M. bovis BCG, M. smegmatis and M. tuberculosis . All the oligonucleotides that were used in this study are listed in Table S1.
We analyzed existing ChIP-seq datasets for 55 M. tuberculosis transcription factors extracted from a previous study . ChIP-seq peak positions were compared to the 5′ end positions of M. bovis BCG and M. tuberculosis sRNAs from the current study and two previous studies , . For M. bovis BCG sRNAs, we first identified the equivalent region of the M. tuberculosis H37Rv genome. Possible sRNA regulators were selected if the ChIP-seq peak was located within 100 bp upstream and 20 bp downstream of an sRNA 5′ end.
Deep 5′ RACE and Deep 3′ RACE were performed as previously described  with the following exceptions. Deep 5′ RACE libraries and Deep 3′ RACE libraries were pooled and sequenced together using an Ion Torrent 316 chip (Wadsworth Center Applied Genomic Technologies Core Facility). For Deep 5′ RACE, sequence reads were identified by the presence of the expected adapter sequence at the read 5′ end. Adapter sequences were removed and reads of >40 nt were mapped to the reference genomes using BWA . For Deep 3′ RACE, sequence reads were identified by the presence of the expected adapter sequence. Adapter sequences were removed. The oligo-dT stretch was removed by identifying the first consecutive pair of bases not including a “T” and removing all sequence upstream of this. Sequences of >40 nt were mapped to the reference genomes using BWA . For both Deep 5′ RACE and Deep 3′ RACE, 5′ and 3′ ends were identified as the position with the most sequence reads, and with a minimum of 5 reads. Sequences of all primers used for Deep RACE are listed in Table S2.
Using SIPHT, we identified 93 candidate sRNAs in M. smegmatis (refseq: NC_008596) (Table S3) and 144 candidate sRNAs in M. bovis BCG (refseq: NC_008769) (Table S4). Tables S3 and S4 include a detailed description of the predicted coordinates, orientations, sizes and neighboring upstream and downstream genes. Northern probes were designed according to SIPHT prediction. Figure 1 summarizes the overall approach that was employed in this work for sRNA identification and confirmation.
All 93 M. smegmatis sRNA candidates were tested by Northern blot using oligonucleotides in both orientations; expression was confirmed for 18 sRNA (listed in Table 1; see blot pictures in Figure 2 and Figure S1). One of them (Sm32/33) was identified in recent work as IGR-1 with similar size, coordinates and same orientation . Thus, 17 M. smegmatis sRNAs identified here have not been experimentally demonstrated in any previous studies. In our previous study , we reported homologs of 6 M. smegmatis sRNA candidates (Sm32/Sm33, Sm35, Sm46, Sm47, and Sm74) in M. bovis BCG (Mpr13/Mcr14, Mpr20, Mpr3, Mpr4, and Mpr5, respectively). These were confirmed directly in M. smegmatis by Northern blotting in current study and listed in the 17 novel confirmed sRNAs. A homologue of Sm76 was previously identified in M. tuberculosis by RNA-seq  and microarray analysis  but not otherwise experimentally confirmed. All of the validated sRNAs were in the same orientation to that predicted by SIPHT. This suggests that the sequence specificity of SIPHT for this prediction is higher than in our previous work, in which 9 out of 37 of the validated sRNAs were in the opposite orientation to the prediction . All confirmed sRNAs were assigned gene names according to a recently-proposed nomenclature .
Given the practical convenience of testing RNA from both species simultaneously to search for novel sRNA candidates, we used the designed probes for sRNA detection in M. smegmatis to also probe expression of these candidates in M. bovis BCG and M. tuberculosis. Although our focus was to validate M. smegmatis predictions, we fortuitously discovered homologues of 9 candidates in M. bovis BCG and 4 candidates in M. tuberculosis (Table 1, Figure S2). Since these probes were not specifically designed for the other two species, lack of detection could be due to either the absence of sRNA expression or to non-optimization of the probe sequence that was used for hybridization to the targeted region in the M. bovis BCG and M. tuberculosis genome. Also, differences in culture medium might contribute to the low number of expressed homologous sRNAs of M. smegmatis in M. tuberculosis as expression of these sRNAs could be specific to different conditions in M. tuberculosis. Given our focus in sRNA identification, specific conditions that could lead to differences in sRNA expression will be explored in future work.
Twenty-one of the sRNA candidates for M. bovis BCG (Bo12, Bo15, Bo41, Bo52, Bo58, Bo67, Bo68, Bo75, Bo80, Bo85, Bo99, Bo100, Bo111, Bo113, Bo115, Bo117, Bo122, Bo125, Bo126, Bo137, and Bo139) were previously identified, under the nomenclature Mpr 1–21, respectively . Forty-six other candidates were also tested previously but showed no signal; therefore, only the remaining 77 candidates were tested using Northern blotting analysis in this study, and we confirmed expression of 23 new sRNA candidates (Figure 3 and Figure S3). A homologue of Bo46 was previously identified in M. tuberculosis by RNA-seq  but not otherwise experimentally validated. All of the validated sRNAs were in the same orientation as that predicted by SIPHT. We also applied the probes to M. smegmatis and M. tuberculosis and identified 20 and 5 sRNA homologues, respectively (Table 2; Figure 3; Figure S3). All the confirmed sRNAs in M. bovis BCG and M. tuberculosis are listed in Table 2, along with the new nomenclature for sRNAs.
We used Deep-RACE, a previously described approach that combines conventional RACE and deep sequencing to identify 5′ and 3′ ends of selected RNAs , . In total, we identified 5′ ends for 9 sRNAs and 3′ ends for 21 sRNAs. Examples are shown in Figure 4. For some sRNAs we identified multiple 5′/3′ ends. Multiple 5′ ends could be due to multiple transcription start sites or RNA processing. Multiple 3′ ends could be due to RNA processing or may indicate imprecise Rho-dependent termination of transcription.
As noted in our earlier study , the predicted size of the candidate sRNAs correlates only weakly with experimental observations. Only about 17% of the confirmed sRNAs were within 10% of their predicted sizes. Additionally, in many cases, multiple bands were detected by Northern analysis, suggesting the presence of multiple start sites, multiple termination sites, and/or sRNA processing. This is consistent with the Deep RACE data (Figure 4; Figure S4). Deep RACE identified both 5′ and 3′ ends for seven sRNAs. In these cases, the sizes determined by Deep RACE are similar to those confirmed by Northern blotting.
To investigate the potential roles of the novel sRNAs, we mapped them all to the latest annotated genome (National Center for Biotechnology Information, NCBI). Although we aimed to find intergenic sRNAs, half of the candidates we identified in this study overlap partially or entirely protein-coding genes in either the sense or antisense orientation (Table 1, Table 2). We categorized sRNAs into different classes according to their position relative to adjacent coding regions. Where possible, we used 5′/3′ end information from Deep-RACE data. For sRNAs that have only one end mapped by Deep-RACE, the other end was estimated according to the length confirmed by Northern blotting analysis (Figure 4). For sRNAs that have neither end mapped by Deep-RACE, the farthest possible ends were estimated according to Northern blotting analysis and the sRNAs would be categorized as “not determined” if multiple class options exist.
Nine sRNAs in M. smegmatis (Sm19, Sm32/33, Sm35, Sm46, Sm49, Sm64, Sm76, Sm82) and twelve sRNAs in M. bovis BCG (Bo35, Bo48, Bo53, Bo60, Bo71, Bo73, Bo78, Bo86, Bo101, B0105, Bo118, Bo132) were mapped completely to intergenic regions. Four sRNAs in M. smegmatis (Sm38, Sm41, Sm90, Sm93) were mapped to the sense strand of annotated protein-coding genes, and four were mapped to the antisense strand (Sm42, Sm67, Sm68, Sm74). One sRNA in M. smegmatis (Sm11) and five in M. bovis BCG (Bo32, Bo47, Bo81, Bo96, Bo130) overlap partially with adjacent genes in the antisense orientation, and four sRNAs in M. bovis BCG (Bo27, Bo46, Bo82, Bo87) overlap partially with adjacent genes in the sense orientation. One sRNA in M. smegmatis (Sm47) and two in M. bovis BCG (Bo13, Bo29) were not classified.
The location of sRNAs relative to protein-coding genes also gives clues as to their function. Regulatory sRNAs that are completely intergenic typically function by base-pairing with distally-encoded mRNAs; however, some of the sRNAs are close to the 5′ end or 3′ end of adjacent genes, suggesting possible alternative regulatory roles. sRNAs antisense to ORFs or UTRs can regulate expression of the overlapping gene . sRNAs located within UTRs or ORFs in the sense orientation may be degradation products or mRNAs or could be important cis-acting regulatory elements such as riboswitches.
sRNAs can be transcribed independently or generated by processing of mRNA UTRs. Several features of the sRNAs identified in this work are consistent with the sRNAs being independently transcribed from their own promoters. First, the Northern blots showed no evidence of larger bands that could correspond to pre-processed mRNAs. Second, 13 sRNAs (Sm35, Sm42, Sm67, Sm68, Sm74, Bo13, Bo32, Bo60, Bo71, Bo73, Bo81, Bo118, Bo130) are orientated away from the surrounding genes. Third, 5 sRNAs (Sm64, Sm82, Bo47, Bo105, Bo132) are located >200 bp from the nearest gene start/stop. Nineteen sRNAs are close to (<200 bp) upstream or downstream coding regions (Sm11, Sm19, Sm32/33, Sm46, Sm47, Sm49, Sm76, Bo27, Bo29, Bo35, Bo46, Bo48, Bo53, Bo78, Bo82, Bo86, Bo87, Bo96, Bo101) and four (Sm38, Sm41, Sm90, Sm93) overlap coding regions in the sense orientation. It is formally possible that these sRNAs are generated by mRNA processing or premature termination, although the Northern blot analysis argues against this. Regardless, sRNAs processed from mRNAs could still have important regulatory functions , , . Indeed, a recent study identified 3′ UTRs as an abundant source of regulatory sRNAs in Salmonella enterica . Alternatively, sRNAs generated by processing of mRNAs could indicate cis-acting regulatory elements such as riboswitches.
The regulation of sRNAs can provide important clues as to their biological functions. However, very little is currently known about regulation of mycobacterial sRNAs. The genome-wide binding profiles of many M. tuberculosis transcription factors have recently been determined using ChIP-seq and these data are publicly available . Although we identified sRNAs in M. bovis BCG, it is highly likely that these sRNAs are conserved in M. tuberculosis given the extremely high similarity of the M. bovis BCG and M. tuberculosis genomes . Hence, we searched existing ChIP-seq datasets of M. tuberculosis for transcription factors that bind close to sRNA 5′ ends, including sRNAs identified in earlier studies . We identified 10 ChIP-seq peaks (indicative of a transcription factor binding site) located between 100 bp upstream and 20 bp downstream of sRNA 5′ ends (Table S5). Thus, we have identified likely examples of sRNA regulation. In some cases, the ChIP-seq peak is also close to the start of an annotated protein-coding gene. Hence, the transcription factor may regulate the protein-coding gene rather than the sRNA. Nevertheless, in four cases, the ChIP-seq peak is unambiguously associated with an sRNA 5′ end. The two examples with highest ChIP-seq signal are shown in Figure 5. For each of these examples, the transcription factor is otherwise uncharacterized.
In summary, we have identified 17 novel sRNAs in M. smegmatis and 23 novel sRNAs in M. bovis BCG, verified 5′ and 3′ ends, and list these sRNAs according to a recently-proposed annotation nomenclature. Our analysis of sRNA position relative to protein-coding genes suggests various potential roles for these sRNAs in gene regulation. Although the specific biological function of these, and all other known mycobacterial sRNAs, is not understood, we speculate that some of these sRNAs contribute to the biology of pathogenic mycobacterial species. Future studies will focus on the functional characterization of these novel sRNAs.
Northern blotting analysis for M. smegmatis sRNAs.
Northern blotting analysis of sRNA candidates in M. tuberculosis with M. smegmatis and M. bovis BCG probes.
Northern blotting analysis for M. bovis BCG sRNAs.
Deep-RACE mapped reads of all sRNAs and adjacent gene annotations.
The oligonucleotide sequence of all probes used for Northern Blotting analysis in this study.
The oligonucleotide sequence of all primers used for Deep-RACE PCR.
All 93 sRNA sequences predicted by SIPHT in M. smegmatis.
All 144 sRNA sequences predicted by SIPHT in M. bovis BCG. This list excludes predicted tRNAs.
All transcription factor ChIP-seq peaks located within 100 bp upstream and 20 bp downstream of sRNA 5′ ends.
We would like to thank Marquis Martin for his assistance with data analysis, Marlene Belfort for supporting our initial studies (please see funding statement), Todd Gray for helpful discussions and for generously providing materials for this study, Matt Stanger for his assistance with figures, Jeanne DiChiara and Damen Schaak for sample preparation. We thank the Wadsworth Center Applied Genomic Technologies Core Facility for Ion Torrent sequencing.
This work was supported by the Welch foundation, the DTRA Young Investigator program, National Institutes of Health (NIH) grant 1DP2OD007188 (JTW), and by an appointment (CB) to the Emerging Infectious Diseases (EID) Fellowship Program administered by the Association of Public Health Laboratories (APHL) and funded by the Centers for Disease Control and Prevention (CDC). Initial funding for preliminary studies was provided by (NIH) grants GM39422 and GM44844 to Dr. Marlene Belfort. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.