The 55 sRNA-encoding genes in E.coli seem to be a much more varied class of genes than tRNA genes and rRNA genes. It is hard to determine a genomic or sequence feature that is shared by all of them. Still, common characteristics could be identified.
By examining the distribution of the sRNA-encoding genes on the genome we found a preference for the left replicore and no preference for the leading or lagging strand. We also found that the sRNA-encoding genes are not clustered in certain intergenic regions, as usually no more than one sRNA gene exists per intergenic region. These genes usually reside in intergenic regions ranging in size from 300 to 900 nt. They very rarely reside in intergenic regions >900 nt, which in E.coli usually contain repetitive sequences.
Our analysis of base composition revealed that all genes are richer in GC in comparison with intergenic sequences. sRNAs are, however, less GC-rich than the other types of genes. There does exist a sub-group of sRNAs, ‘housekeeping sRNAs’, that are richer in GC compared with the other sRNAs. These sRNAs seem to have a base composition more similar to that of structural RNA genes, such as tRNAs and rRNAs. The difference in the GC content could point to the different structural requirements associated with the function of the sRNAs; regulatory sRNAs versus housekeeping sRNAs.
Since three out of the five studies that led to the discovery of the 45 novel sRNAs relied on sequence conservation, it is not surprising that most of the known sRNAs are conserved in closely related bacteria. The conservation is strongest in the other E.coli strains and in S.flexneri, while only 19 of the sRNAs are conserved also in Y.pestis. Only four of the sRNAs were conserved beyond Y.pestis. Three of these sRNAs carry out housekeeping functions. No sRNA homologs were found in archea. It is important to note that since we examined conservation through sequence similarity alone, it is possible that some of the sRNA homologs that maintain only structural conservation may have been missed.
Conservation analysis of the sRNA-encoding genes along with their flanking genes revealed stronger conservation in the other strains of E.coli, S.flexneri and the two Salmonella strains, than in Y.pestis. In two of the cases in which gene order was conserved in Y.pestis, the adjacent gene and the sRNA were known to be associated. This may point to a functional association for the other sRNA-encoding genes and their adjacent genes in cases where such a conservation was observed.
Most sRNAs do not show sequence similarity with other sRNAs. We found that in most cases where sRNAs were similar in sequence they were also located in neighboring genomic locations, suggesting that they may have resulted from duplication events. Indeed, most of these were previously identified as intergenic repeats (33
). It could therefore be interesting to check other intergenic repeats, to see whether they too encode for sRNA molecules.
We compiled the candidate sRNA genes predicted in the different studies and compared them with the annotation of 5′ UTRs, 3′ UTRs and operons reported by the study of Tjaden et al
). After uniting overlapping predictions into single candidates there remain 906 candidates that are unique to a single study, 85 candidates that were predicted by two studies and 10 candidates that were predicted by three studies. We find that most candidates are not located within annotated operons, 5′ UTRs or 3′ UTRs.
No function is known at present for 42 out of the 55 discovered sRNAs. Our survey provides pointers that may aid in associating function to some of these molecules. Also, the various characteristics we have identified may be used for the development of a refined algorithm for predicting additional sRNA-encoding genes in E.coli, as well as in closely related organisms.