We present a computational method for predicting targets of sRNA genes in bacteria. While numerous
in silico approaches have been proposed recently for identifying targets of miRNAs in eukaryotes (
6–
16), there has been a relative dearth of such approaches for sRNAs in bacteria. This lack of
in silico methods may be due, in part, to the paucity of reported targets of sRNA regulation. To date,
E.coli contains the best-studied set of sRNAs and targets. We compiled a list of 12 such targets in
E.coli, all described in the literature prior to 2005. This set of targets was examined for common features, and a computational method was developed for predicting novel targets. The effectiveness of the approach was evaluated on the training set of 12 reported targets as well as on sets of predictions for the RyhB, OmrA, OmrB and OxyS sRNAs, for which the predictions could be compared with results from whole genome expression analyses.
The percentage of computationally predicted targets for which there was experimental support from microarray and northern blot assays ranged from ~4 to 8% for the OmrA and OmrB RNAs up to 56% for the RyhB RNA. The different success rates with different sRNAs may be due to a number of factors including limitations of the microarray analysis that lead us to mistakenly underestimate the success rate of the predictions and the possibility that the program is less useful for some RNAs than others.
The low success rate for RNAs such as OmrA and OmrB may be due to a number of caveats associated with our experimental analysis. If pairing of an sRNA frequently leads to translational inhibition without mRNA degradation, our assay method, which is dependent upon changes in the mRNA levels, would improperly count the result as negative. Future experiments that directly test translation will be necessary to address this possibility. In addition, because we only evaluated targets that were detected at a sufficient level in the vector control to be judged ‘present’, the nature of the targets and their level of expression may change our evaluation of success. For instance, it is possible that RyhB target mRNAs are more abundant, in general, than OmrA and OmrB target mRNAs. If this is the case, OmrA and OmrB targets would be more likely to be deemed ‘absent’ under the assayed growth conditions and the rate of success for our computational predictions could be underestimated.
A few caveats regarding the TargetRNA program also should be taken into consideration. The program does not account for the structures of either the sRNA or mRNA. It is possible that some predicted basepairing interactions do not occur because the corresponding regions of either the sRNA or the mRNA are occluded by secondary structure. In addition to structure, some other feature of either the sRNA or mRNA not accounted for by TargetRNA, such as the presence of an Hfq-binding site, may be required for productive basepairing. Finally, while all of the sRNAs examined here bind Hfq and are believed to act by basepairing, they may represent different classes of sRNAs and may not follow the same rules for basepairing. Because the training set used to develop the TargetRNA program used RyhB and OxyS substrates, and not OmrA and OmrB substrates, the program is not optimized for the latter RNAs. As an attempt to address this issue, we revisited the program parameters with a new training set, derived from the experiments presented here and recent results from the literature. The new training set of 25 targets included a number of OmrA and OmrB targets. However, we did not find a set of parameters that led to significant improvement in recognizing targets for the OmrA and OmrB sRNAs.
Alternatively some sRNAs may act on only a limited number of targets, while others may have many targets. For RyhB and OxyS, 209 and 186 genes, respectively, showed at least 2-fold changes while for OmrA and OmrB, 34 and 24 genes, respectively, showed at least 2-fold changes after expression of each of the sRNAs in one microarray experiment. While some of the effects of RyhB expression are known to be indirect, there were still more global effects, in general, of RyhB expression than of OmrA or OmrB expression.
Despite some limitations of both the TargetRNA program and whole genome expression analysis, we suggest that the combination of the two approaches will be an effective approach for identifying direct targets for an uncharacterized sRNA. Functional annotation may also be a useful indicator for identifying candidate targets. In several cases, the set of targets predicted by TargetRNA for a given sRNA was enriched for genes that appear functionally similar. For instance, among the top candidate targets for the sRNA GcvB were mRNAs gltI, livJ, livK, ytfT, aroP and argT, all genes encoding periplasmic transport proteins. Similarly, a number of top candidate targets for the sRNA RyhB encode non-essential iron-binding proteins.
While the method was evaluated on targets of sRNAs in
E.coli, the approach is applicable to bacteria more generally. For example, in searching for targets of the sRNA BsrA in
Bacillus subtilis (using a seed of at least eight nucleotides), the message target
rplU which encodes a ribosomal protein was predicted. The
rplU target of BsrA in
B.subtilis has been documented previously (
40). In addition, an ortholog of the
rplU gene in
Listeria monocytogenes was predicted as a target of the BsrA ortholog in
L.monocytogenes. More generally, when searching for targets of an sRNA in a given organism, the program calculates the hybridization scores of orthologous targets with orthologous sRNAs in other bacteria. Since many sRNA genes are conserved across related species, the program can thus evaluate whether the targets are conserved and whether the hybridization interaction is conserved across species. One of the training examples was the GcvB:
dppA interaction (
41). For orthologous GcvB genes in
S.typhimurium,
S.flexneri,
Yersinia pestis and
Photorhabdus luminescens, the program identifies orthologous
dppA targets in all four bacteria. In each of the species, the hybridization score of the orthologous GcvB gene and the orthologous
dppA target places the target among the top candidate predictions.
The application of TargetRNA to the E.coli RyhB, OmrA, OmrB and OxyS RNAs has already expanded the number of known targets for these regulatory sRNAs. We anticipate that as the number of known sRNA:mRNA interactions increases, we will better understand the applicability and the limitations of in silico target prediction approaches. In addition, an expanded set of known targets will allow for further refinements of computational approaches for target prediction.