|Home | About | Journals | Submit | Contact Us | Français|
Editor Stanley Maloy, San Diego State University
The vast majority of annotated transcripts in bacteria are mRNAs. Here we identify ~1,000 antisense transcripts in the model bacterium Escherichia coli. We propose that these transcripts are generated by promiscuous transcription initiation within genes and that many of them regulate expression of the overlapping gene.
The vast majority of known genes in bacteria are protein coding, and there are very few known antisense transcripts within these genes, i.e., RNAs that are encoded opposite the gene. Here we demonstrate the existence of ~1,000 antisense RNAs in the model bacterium Escherichia coli. Given the high potential for these RNAs to base pair with mRNA of the overlapping gene and the likelihood of clashes between transcription complexes of antisense and sense transcripts, we propose that antisense RNAs represent an important but overlooked class of regulatory molecule.
Recent high-throughput sequencing analyses of RNA in eukaryotes have revealed a far more complex network of RNAs than previously appreciated, including thousands of RNAs antisense to protein-coding genes (aRNAs) (1). In contrast, relatively few aRNAs have been identified in bacteria (2). Studies of individual plasmid-encoded and chromosomally encoded aRNAs in a variety of bacterial species have demonstrated that aRNAs can regulate expression of the overlapping gene at the level of translation, mRNA stability, or transcription (3–11). Several studies have hinted at the existence of many more aRNAs, in multiple bacterial species, than those currently described (5, 8, 10, 12–18), suggesting that aRNAs have a widespread regulatory function in bacteria.
We sought to identify novel aRNAs in Escherichia coli. We generated a cDNA library by extracting RNA from rapidly growing cells (wild-type strain MG1655 grown with aeration in LB to an optical density at 600 nm [OD600] of 0.7), treating the RNA with tobacco acid pyrophosphatase to convert 5′-triphosphate groups to monophosphates, ligating an RNA oligonucleotide (5′-ACACUCUUUCCCUACACGACGCUCUUCCGAUCU-3′) to the RNA 5′ ends, reverse transcribing with a primer in which the nine 3′-end proximal bases are random (5′-GTTTCCCAGTCACGATCNNNNNNNNN-3′), and amplifying by PCR. Using Solexa sequencing, we identified unique RNA 5′ ends. The mapped RNA 5′-end locations include many known transcription start sites: 24% of sequences of published transcription start sites are matched exactly by a sequence from our library, and 41% of those sequences are ≤2 bp away from a sequence from our library (19). The exact matches include the majority of known aRNAs (GadY, RyjB, RdlA, RdlD, RyeA, SokB, and SokC). The RNA 5′-end locations also include 1,005 locations that map antisense to protein-coding genes (see Table S1 in the supplemental material), suggesting the existence of many more aRNAs. These putative aRNA 5′ ends were each sequenced between 1 and 5,488 times. An additional 385 ends map antisense to known and predicted 5′ and 3′ untranslated regions (UTR) (see Table S1 in the supplemental material) (20).
The housekeeping σ factor σ70 binds a bipartite DNA sequence at E. coli promoters during transcription initiation. The downstream recognition site, the −10 hexamer, has the consensus sequence TATAAT and is typically positioned 7 or 8 bp upstream of the transcription start site (21). For the set of 471 published transcription start sites (19), the −10 hexamers match the consensus, on average, 3.28 times out of 6 (−10 match score) (base distribution shown in Fig. 1A). In contrast, 1,000 randomly selected sequences antisense to genes match the consensus only 2.00 times out of 6 (control match score) (base distribution shown in Fig. 1B). This difference is highly significant (Mann-Whitney U test, P of 8.9e−70). Furthermore, 46% of the RNAs with published start sites initiate with “A,” significantly more than expected by chance (P < 1e−22) (Fig. 1A and B). The −10 hexamer sequences for the 1,005 putative aRNAs identified in this work have a −10 match score of 3.27, significantly higher than the control match score (Mann-Whitney U test, P of 8.8e−102) (base distribution shown in Fig. 1C). This holds true even for the 141 aRNA 5′ ends that were sequenced only once (score of 3.12; Mann-Whitney U test, P of 2.8e−21). The −10 match score for the 1,005 aRNAs is not significantly different from that for the set of published start sites (Mann-Whitney U test, P = 0.49). Moreover, 48% of the putative aRNAs initiate with “A,” significantly more than expected by chance (P < 1e−50) (Fig. 1B and C) but not significantly different from the set of published start sites (Fisher’s exact test, P of 0.40) (Fig. 1A and C). Thus, the promoters and transcription start sites of the 1,005 putative aRNAs have DNA sequence properties that are indistinguishable from those of characterized transcripts.
To experimentally validate the putative aRNAs, we fused the promoter regions (up to 200 bp upstream of the putative transcription start site) of 10 aRNAs to a lacZ reporter gene and measured expression levels in a β-galactosidase assay. In 9 out of 10 cases tested, we detected lacZ expression that was significantly reduced by mutation of the −10 hexamer (Fig. 2A). We conclude that the large majority of putative aRNAs are genuine and that our transcription start site assignments are highly accurate.
We selected two mRNAs, rplJ and yrdA, that each overlap a putative aRNA. We translationally fused the mRNAs in frame to lacZ, under control of the natural mRNA promoter, and compared the expression levels of lacZ for a wild-type construct and a construct containing a mutated −10 hexamer and +1 nucleotide for the aRNA (+1 nucleotide not mutated for yrdA). Expression of lacZ increased significantly upon mutation of the aRNA promoter for rplJ but not for yrdA (Fig. 2B). This strongly suggests that the aRNA overlapping rplJ represses expression of the mRNA.
Our data demonstrate that (i) antisense transcription is widespread in E. coli and (ii) aRNAs can regulate expression of the overlapping gene. Regulation by aRNAs is likely to be widespread, since all previously characterized bacterial aRNAs regulate expression of the overlapping gene (3–11). The majority of aRNAs are likely to be noncoding due to constraints imposed by the overlapping protein-coding sequence. A small fraction of aRNAs may be mRNAs for which the 5′-end UTR is antisense to another gene; however, this is unlikely in most cases, since only 21% of aRNAs initiate ≤500 bp upstream of a known translation start site on the same strand. Since they are likely to be noncoding, aRNAs are also likely to be substrates for Rho-dependent termination, which occurs within the first few hundred nucleotides of transcription (14). We conclude that the majority of aRNAs are short (<500-nucleotide), noncoding transcripts.
We speculate that most of the novel aRNAs are generated by promiscuous transcription initiation within genes, as has been suggested for eukaryotic genomes (22). This hypothesis is consistent with the presence of many transcription factor and σ binding sites within genes (15, 18, 23–26), the low information sequence requirements required to promote transcription in bacteria (21), and the absence of inhibitory chromatin structure within bacterial genes (26). aRNAs are likely to have a major impact on bacterial gene expression due to the high potential for base pairing with an mRNA and the high likelihood of transcriptional interference resulting from the overlap of aRNA and mRNA transcription units. Given that aRNAs have been identified in a wide range of bacterial species, we propose that aRNAs are important regulators of gene expression in all bacteria.
NCBI short read archive accession number. Raw sequencing data are available under Accession Number SRA012168.4.
Details of 1,005 putative aRNA transcription start sites.
We thank Steve Hanes, Marlene Belfort, Randy Morse, Todd Gray, Chris Karch, Michael Keogh, David Grainger, Zarmik Moqtaderi, and Keith Derbyshire for helpful discussions. We thank the Computational Biology and Statistics and Applied Genomic Technologies Core Facilities at the Wadsworth Center, New York State Department of Health, for expert technical assistance.
Citation Dornenburg, J. E., A. M. DeVita, M. J. Palumbo, and J. T. Wade. 2010. Widespread antisense transcription in Escherichia coli. mBio 1(1):e00024-10. doi:10.1128/mBio.00024-10.