|Home | About | Journals | Submit | Contact Us | Français|
Bacterial sRNA-mediated regulatory networks has been introduced as a powerful way to analyze the fast rewiring capabilities of a bacteria in response to changing environmental conditions. The identification of mRNA targets of bacterial sRNAs is essential to investigate their functional activities. However, this step remains challenging with the lack of knowledge of the topological and biological constraints behind the formation of sRNA-mRNA duplexes. Even with the most sophisticated bioinformatics target prediction tools, the large proportion of false predictions may be prohibitive for further analyses. To deal with this issue, sRNA target analyses can be carried out from the resulting gene lists given by RNA-SEQ experiments when available. However, the number of resulting target candidates may be still huge and cannot be easily interpreted by domain experts who need to confront various biological features to prioritize the target candidates. Therefore, novel strategies have to be carried out to improve the specificity of computational prediction results, before proposing new candidates for an expensive experimental validation stage.
To address this issue, we propose a new visualization tool rNAV 2.0, for detecting and filtering bacterial sRNA targets for regulatory networks. rNAV is designed to cope with a variety of biological constraints, including the gene annotations, the conserved regions of interaction or specific patterns of regulation. Depending on the application, these constraints can be variously combined to analyze the target candidates, prioritized for instance by a known conserved interaction region, or because of a common function.
The standalone application implements a set of known algorithms and interaction techniques, and applies them to the new problem of identifying reasonable sRNA target candidates.
The online version of this article (doi:10.1186/s12859-017-1598-8) contains supplementary material, which is available to authorized users.
Bacterial small RNAs (sRNAs) are crucial regulators that often act to transmit environmental signals when bacteria meet stressful growth conditions [1–3]. These sRNAs are known to regulate changes in cell behavior (including cellular metabolism  and quorum sensing , for instance, to improve the probability for survival. Although some bacterial RNA regulators were identified since the early 80’s, their involvement in numerous physiological responses and the idea of their probable universal distribution in the prokaryotic world have only emerged recently . These findings are deeply modifying our view on the way bacteria can regulate gene expression to rapidly adapt their metabolism in response to environmental changes and/or stresses [4, 6]. The important role of sRNAs, acting as regulators, in the establishment of virulence has been also noted in several bacterial pathogens [6, 7]. Moreover, recent data from high throughput technologies have shown that small RNAs are much more represented in the bacterial world than previously expected [7–9].
Focusing on sRNAs acting as negative or positive post-transcriptional regulators by base-pairing mRNAs, the identification of their targets is challenging and needs a better understanding of the topological and biological constraints behind the formation of sRNA-mRNA interactions. Moreover sRNA can have several targets, and some mRNAs may be hybridized by several sRNAs forming SIM (Single-Input Module) or DOR (Dense Overlapping Regulon) motifs of regulation .
In the omics era where the amount of available data is still increasing, one of the challenge facing the use of predictive bioinformatics tools is to be able to make use of the huge number of predicted results. Even with the most sophisticated bioinformatics target prediction tool, the large proportion of false predictions may be prohibitive to enable further analysis. To deal with this issue, sRNA target analysis can be run from the resulting gene lists given by RNA-SEQ experiments when available. However, the number of resulting target candidates may be still huge and cannot be easily interpreted by biologist experts who needs to confront various biological features to prioritize the target candidates.
Pathway analysis tools intend to deal with these issues. Accordingly, one of the most famous software, Cytoscape , supports many features dedicated to the visualization and exploration of large networks. The addition of plugins to Cytoscape is also needed for integrating network-dedicated functionalities. For instance, the ReNE plugin  enables to query biological databases for building a regulatory network while integrating external heterogeneous data. While transcription factors and microRNAs can be analyzed together, the main limitation lies in the fact that ReNE only supports the analysis of two eukaryotic organisms. In the annotation of gene set using enrichment strategies, the most popular plugins are Bingo and ClueGO [13, 14]. At the moment no integrated tool provides functionalities dedicated to small RNAs to support their analysis by making use of annotations together with interaction and regulatory motif information. However, these issues need to be further investigated.
In this context, we introduce the rNAV 2.0 software, the first available release of the prototype presented in  as a proof of concept (for a comparison of its functionnalities with in addition Cytoscape, see Additional file 1). Our software aims at proposing original facilities, using a combined strategy based on bioinformatics and visualization approaches.
Combining bioinformatics and visualization help biologists in the analysis of such sRNA-mediated regulatory networks by supporting the exploration and visualization of the huge number of predicted sRNA targets produced by existing bioinformatics tools .
Information Visualization has now been established as a fruitful strategy to tackle the problem posed by the abundance of information (for an overview on biological data visualization, the reader can refer to [16, 17]).
Main aspect a visualization system, such as rNAV 2.0, has to deal with is the representation of the entire network as it eases the identification of the main trends in the data and therefore to guide the user during his/her exploration and to focus his/her attention on interesting parts.
rNAV 2.0 have been mostly developed in C++ but also in Python and is provided under LGPL. The software is based on Tulip framework . Tulip is an information visualization framework for the analysis and the visualization of relational which provides a complete library for the design of interactive visualization applications. The framework also enables the development of plugins to integrate new algorithms, visual encodings, interaction tools and also domain-specific visualizations. Following Munzner’s recommendation , Tulip also supports the implementation of user interface overlays and domain specific software.
The Model-View-Controller architecture of Tulip allows rNAV to support multiple and synchronized views. Any interaction on a view (e.g. selection of an element) implies the automatic update of all views displaying this data.
The sRNA-mediated regulation network is modelized by a bipartite graph where nodes represent either sRNAs or mRNAs and interactions are given by edges linking an RNA to another.
To predict interactions between sRNAs (fasta file) and mRNAs (embl/genbank file), rNAV integrates two well-known bioinformatics tools: ssearch , by reversing and complementing the second RNA to focus on interaction instead of homology, and IntaRNA . The functional annotation of sRNA targets is achieved by DAVID webservice , a single gene-term enrichment analysis software which exploits several biological knowledge databases. (e.g. GO, KEGG Pathways, UniProt Sequence Features). DAVID computes, for a given sRNA and its putative mRNA targets, the over-represented biological annotations (as well as a significance score) that are associated to the putative sRNA-mRNA interactions.
Several techniques can be considered to visualize networks. Among them, the main options are matrix based diagrams (which is merely a visual representation of the adjacency matrix) and node-link diagrams (where nodes are represented by a glyph and an edge by a link between the corresponding nodes). While matrix based diagrams are particularly useful for the visual exploration of large and dense graphs as it allows to better perceive local densities, it does not seem to be the best option in the case of sRNA-mediated regulatory networks. Indeed, it would be difficult to emphasize edge’s attributes (e.g. interacting regions, p-values, functional annotations, etc) with such a visualization technique. In rNAV, node-link diagrams have therefore been preferred over matrix based diagrams as links between nodes provide a relatively large area to show edge’s attributes (see Fig. Fig.1).1). To emphasize the bipartite structure of the network, i.e. to help the identification of sRNAs and mRNAs, different colors and shapes are used (in Fig. Fig.1,1, sRNAs are drawn as blue squircles and mRNAs as orange circles). In addition, edge’s attributes can be displayed along it (e.g. in Fig. Fig.1,1, putative interacting regions are displayed).
Finally, rNAV supports multiple and linked views so as to enable the user to compare the results of different algorithms and/or different algorithms’ settings. In other words, any interaction on a view will affect the others. For instance, selecting elements in one view will also select these elements in all the views (if present in the corresponding networks).
To guide the user in his/her exploration, rNAV integrates four categories of algorithms: Filter, Coloration, Layout and Calculation. While algorithms of the Coloration and Layout categories respectively allow to color RNAs and putative interactions according to some measures and to lay the network out in the plan, Filter and Calculation categories contain algorithms to reduce the scope of the analysis. Algorithms of these categories allow to focus the exploration on sub-parts of the network and therefore facilitate the understanding of biological processes and the investigation of the putative cellular organization. Using Filter algorithms, the user can specify RNAs of interest according to biological knowledge (e.g. RNA names or identifiers, or regions of the primary sequences), but also according to topological motifs they belong to (e.g. SIM and DOR motifs ). Algorithms of the Calculation category allow to compute simple however useful graph theoretical measures (e.g. degree or betweenness centrality). In addition that category includes algorithms to cluster for each RNA its putative interactions. That clustering algorithms group putative interactions according to some distance between them. rNAV actually integrates two distances measuring either how close their interacting regions are or how similar their functional annotations are. Then using the distance, the MCL clustering  is performed to group similar interactions. On the one hand, the interacting regions based distance helps to identify interactions happening on similar regions of the considered RNA, and then functional annotations of each group can be analyzed to identify clusters related to particular biological processes. On the other hand, functional annotations based distance allows to perform the dual analysis. Clustering allows to first group interactions with similar annotations and then identify whether or not a biological process is related to a particular region of the sequence.
To extract interesting candidates from the list of all the putative interactions, the user often needs to perform several and successive filters. rNAV supports the edition of algorithm pipelines as an ordered sequence of algorithms of any category. In that sequence, the output of an algorithm is the input of the next one. For a reproducibility purpose, our tool also supports saving of algorithms pipelines. These saved pipelines are then reachable from the Action list button in the user interface.
rNAV integrates classical zoom and pan and box-selection interaction tools as well as a dedicated functionality that displays on demand detailed informations about an RNA of interest (see Fig. Fig.1).1). The latter interaction tool has been adapted from  to highlight in the visualization the putative interactions of a given RNA. It also shows the number of interactions each base of its primary sequence is involved in with a curve inside. In case where interactions have been clustered, the resulting clusters are displayed as colored rectangles spanning the region of the corresponding interactions (see Fig. Fig.11 B 1 and B 2). These rectangle as well as box-selection can also be used to select neighbors of the focused RNA.
The exploration of the entire network can be time-consuming and may even require multiple analysis sessions. To help the user to remember his/her previous analyses, a combination of features have been integrated in rNAV. First, rNAV integrates an exploration tree view displaying the algorithms that have been run during the analysis (see Fig. Fig.2).2). In that tree, each node corresponds to a state of the analysis. In particular, the root node corresponds to the entire network. Each time an algorithm is run on a sub-network associated to a node u of the tree, we add a new child to u corresponding to the output of the algorithm. When running a pipeline of algorithms, a new branch is created in the tree where each node of the branch corresponds to an algorithm of the pipeline. The user can also add additional information to each node of the exploration tree, i.e. to each state of the exploration (by default, the information is the name of the corresponding algorithm). Second, the user can save the current state of his/her analysis. In particular, rNAV can save the network as well as all the exploration tree. During the next sessions, the user will thus be able to access each state of his/her previous analysis.
These past few years, the rapid and successful evolution of high-throughput sequencing technology has provided an exponential growth of the number of known sRNAs in bacteria. The present estimation of sRNAs known to be expressed by either Escherichia coli and/or Salmonella enterica is around 300 . Among them, a large set of sRNAs are already known as regulators and are trans-acting with mRNAs by base-pairing with 5’ UTR mRNA regions (see sRNATarBase 3  and EcoCyc ). As previously mentioned, sRNA regulation are inter-twisted with transcription factor regulation. One illustration can be given by the mechanisms that lead to the regulation of the iron storage in Escherichia coli. Iron is both essential and toxic to most bacteria cells. To survive, the cell has to maintain accurate its rate of iron by rapidly managing the incomes and outcomes of iron. In Escherichia coli, the iron rate is mainly regulated both by the FUR transcription factor and the RyhB sRNA.
In addition, the regulation can be coordinated by several sRNAs in response to crossed perturbations of the cell. Again, an illustration can be given with the RyhB and FnrS sRNAs which share four mRNA validated targets. These four genes are involved in different metabolism processes, covering the energetic metabolism (sdhA, sodB) and the response to external stress (marA and nagZ). To go further and analyze the sRNA regulation using rNAV, we focused on these two sRNAs, FnrS and RyhB. In this context, we performed an analytical strategy with the objective to propose new target candidates. In our strategy, we exploit the information that can be collected from the validated targets to prioritize the resulting gene lists given by the high-throughput sequencing results of sRNA mutants.
To initiate this work, we first constructed the network using rNAV, derived from both a multi-fasta file for the sRNA sequences and an embl file for the Escherichia coli genome, using adjustable parameters for the 5’ UTR extraction of each mRNA sequences. Interactions were computed using IntaRNA  through rNAV (see Additional file 2). In each gene/mRNA subset related to a given sRNA, the enrichment of genes having an annotation of interest was performed using a statistical test computed by the DAVID web-services .
At the start of this study, the experimental validated targets of these two sRNAs were compiled using the sRNATarBase 3 , EcoCyc  and [28, 29] (see Table Table1)1) and depicted using rNAV in the Fig. Fig.1.1. The main known information about these two sRNAs is summarized in the following paragraphs and has served as a basis in our analysis:
RyhB is a sRNA of 90 nucleotides in length, whose expression is induced by a low level of iron in the bacterium. The analysis of RyhB was motivated by the fact that this sRNA is one of the most studied because of its involvement as a master key in the regulation of iron homeostasis . Accordingly, RyhB was widely experimentally investigated by [31, 32]. Most of the bacteria need to secure sufficient stocks of iron to enable their assessment through essential pathways. As a consequence, numerous proteins are involved to maintain this homeostasis and need to be regulated by sRNAs like RyhB in a precise and coordinated way. Therefore, deciphering its potential action is of great interest in the study of the potentiality of sRNA regulation in complex regulatory networks. Moreover, other of its known targets are involved in the TCA . No experimental work has been done to investigate the region which interacts with the gene targets. A prediction is proposed using bioinformatics tool within the interval [40..70](see sRNATarBase). FnrS is a sRNA of 122 nucleotides in length. It is produced by the bacterium in anaerobic conditions under the control of the FNR and ArcA proteins. This sRNA may regulate more than 30 mRNAs, mainly involved in energy metabolism . For example, FnrS is involved in the regulation of enzymes required in the shift from aerobic to anaerobic conditions when the cell is undergoing an oxidative stress . Two interacting region seeds have been identified using mutation experiments, from the positions 4 to 6 (when interacting with folE, folX and sodB) and from the positions 47 to 49 (when interacting with maE and gpmA).
To illustrate the usefulness of rNAV 2.0, we exploited new experimental results that have been performed for both sRNAs. For each sRNA, a compilation of RNA-SEQ experiments provided a list of putative gene targets whose expression is related to the sRNA. To demonstrate if the interaction between the mRNA and sRNA is direct or indirect, further experimental works have to be done. Then, our objective is to prioritize the best candidates for designing this experimental work to validate the interaction. We carried out two types of analysis, a Bottom Up and a Top Down analysis. In the RyhB analysis, we started from the whole network, then made use of known biological constraints and translate them into filtration stages and at last we focused on the intersection with the putative targets. By contrast, for the FnrS analysis, we first looked at the putative targets (given by experimental works) to then apply different annotation filtering stages.
The following stages of this analysis are represented by the exploration tree view of rNAV: they correspond to the two first branches in the Fig. Fig.22 and the resulting sub-networks are given in the Fig. Fig.3.3. Using MS2-affinity purification coupled with RNA sequencing technology, 37 genes have shown a repression profile related to the expression of RyhB . Starting from the complete RyhB SIM-like motif where more that 1000 targets were predicted by IntaRNA, we applied two consecutive filtering steps to filter out the prediction (i) with an IntaRNA score lower to –5 and (ii) with an interaction which overlaps the [35–65] sRNA region. Then, two subnetworks were extracted for showing up the targets with annotations related to (i) the storage and absorption of iron and to (ii) the presence or absence of oxygen. Respectively, we get 96 and 54 targets among which 28 and 18 were included in the data given by Masse et al. .
At last, considering the intersection of both filtering stages, 15 targets are common and may be of particular interest for designing experimental protocols with the objective to identify new targets that are both involved in the iron storage/absorption and in the aerobic/anaerobic energy metabolism.
In a first work dedicated to the study of FnrS , the use of an experimental protocol to over-express this sRNA has demonstrated at least 2-fold decreasing expressions of 32 genes. In a second work of Boysen et al. , global proteomic and transcriptomic approaches were combined. Using a threshold of 1,5 fold time of under expression, a list of 16 regulated putative targets was proposed. After the subtraction of the 13 known targets, the total given by these two publications and for which a prediction was given by IntaRNA put the number of putative targets to 18.
Focusing on these data from the beginning of the analysis, the following stages correspond to the third and fourth branches of the exploration tree in the Fig. Fig.22 and the resulting sub-networks are given in the Fig. Fig.4.4. First, we used the MCL algorithm to cluster together the 31 targets according their interaction region. We obtained two region clusters which contain the two interaction seeds that were experimentally proved. Then, this position information was used in combination with the annotation after the application of a DAVID enrichment analysis. Three filtering steps were applied, using annotations related to the iron storage and absorption, the energy metabolism and the presence and absence of oxygen. The results are given as sub-networks in the Fig. Fig.4.4. From the list of 31 genes, we highlighted three new targets (adhP, grxD, and nfsA) that satisfy biological constraints common to validated experimental works.
We performed an analytical pipeline based on the sRNA-mediated regulatory network of Escherichia coli with the two sRNAs, FnrS and RyhB.
First, RyhB, whose expression is induced by a low level of iron, is known to interact with iron-containing enzymes and iron-transporting proteins . From the 37 repressed genes given by , we have highlighted using rNAV a list of 15 targets. Making use of this small list of genes, we have carried out a deeper study to prioritize few genes according to a coordinated regulation hypothesis. For instance:
Interestingly, all of these new targets have also been identified using ribosome profiling (Ribo-seq) in a recent work demonstrating the new great applications of these experimental approaches for the genome-scale identification of sRNA targets . Using the same filtering steps (score and region) and according to the annotations related to (i) the storage and absorption of iron and to (ii) the presence or absence of oxygen, we have obtained two lists of 7 and 4 targets (coming resp. from the lists of 96 and 54 targets). Of particular interest, the nap operons are retrieved for both annotation filtering stages (see Fig. Fig.55).
Second, FnrS is involved in the regulation of enzymes required in the aerobic-anaerobic transition when the cell is undergoing an oxidative stress . Of particular interest for this sRNA, two interaction regions are known from the 13 validated targets. With rNAV and using the MCL algorithm for the 31 targets given by [35, 37], this two regions have been perfectly retrieved. When using functional information known for FnrS, only 7 targets are proposed - with 4 targets that were already known (see Table Table1).1). While these two seed regions are involved in iron storage, only the second one interacts with genes related to the aerobic/anaerobic energy metabolism. This observation may be of interest to investigate the reasons for the presence of two interaction regions.
In both studies, the combination of additional biological features have given compelling arguments for prioritizing candidates. Moreover, rNAV improved our exploration of the sRNA-mediated networks. Very quickly, rNAV can help to realize a deep study that seems very complex at first. This advantage is illustrated by the very small number of steps that were designed in rNAV (see the rNAV pipelines in the Fig. Fig.2)2) for investigating the RyhB and FnrS targets.
To address the large amounts of data generated by new sequencing technologies, the integration of additional biological information is of great importance. Focusing on sRNA-mediated networks, bioinformatics tools are necessary to prioritize regulated targets but are still producing a prohibitive number of candidates. To address this problem, our strategy relies on providing a posteriori multipurpose information, according to the heterogeneous data extracted from databases and the visual exploration/analysis of the biological network. We present a new software for developing bioinformatics strategies to explore sRNA-mediated networks. The methodology used in rNAV combines bioinformatics tools which predict interactions and visualization techniques, and is freely available at http://rnav.labri.fr.
rNAV 2.0 is free software licensed under LGPL. Source code is available at http://rnav.labri.fr. rNAV was done within the EVIDEN and the MycoRNA projects. rNAV has been developed in C++ and python and is available for linux OS under LGPL.
This work was partially done under the EVIDEN project (ANR 2010 JCJC 0201 01), supported by the ANR (France); and under the MycoRNA project, PEPS CNRS/IdEx Bordeaux, 2013. These funding bodies did not play any role in the design or conclusions of our study.
RB conceived and designed the software architecture. JD and WB implemented the software. ID and PT designed and conceived the experiments. PT built and manually curated the reference datasets. RB and PT wrote the manuscript. ID, RB and PT re-read and corrected the final manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Romain Bourqui, Email: rf.irbal@iuqruob.
Isabelle Dutour, Email: email@example.com.
Jonathan Dubois, Email: rf.ssenisub9@siobud-nahtanoj.
William Benchimol, Email: firstname.lastname@example.org.
Patricia Thébault, Email: email@example.com.