|Home | About | Journals | Submit | Contact Us | Français|
Sequencing of the first Escherichia coli (K-12) genome revealed that there were ~4,300 open reading frames (ORFs) expected to encode proteins and many stable RNA genes that encode rRNA and tRNA (3). Surprisingly, at that time only 43% of the ORFs had been previously described, and 38% could not be assigned even predicted functions. These observations suggested that there was a wealth of information still awaiting discovery even in this extremely well-studied model organism, which spurred great interest and focused studies on the ORFs of unknown function. Over the past decade it also has become increasingly apparent that there is an abundance of additional genes that play important roles in cellular physiology; these genes encode small RNAs (sRNAs) and small proteins (small ORFs [sORFs], defined as ≤50 amino acids) that were overlooked in the initial annotations due to their small size and/or lack of open reading frames (8, 16). Most of the sRNA and sORF genes are currently of unknown function; however, sRNAs and sORFs of known function often have regulatory roles, frequently in signal transduction pathways and in coordinating regulatory networks (6, 8, 16). Therefore, there is great interest in these gene classes and the identification of their functions, and two papers in this issue of the Journal of Bacteriology, by Hobbs et al. (9) and Hemm et al. (7), present new genome-based approaches to search for functions of sRNAs and sORFs.
Although directed genetic and genomic approaches (1, 2) have been quite successful in identifying functions for protein-coding genes and in reducing the number of ORFs of unknown function (10), these approaches have been less profitable for identifying sRNA and sORF gene functions. Loss of function of regulatory sRNA-encoding genes typically leads to more subtle phenotypes than commonly observed when regulatory proteins are mutated, perhaps due to the observation that sRNA regulation is usually more modulatory in nature than that of their protein counterparts that commonly direct large-scale changes (16). As a consequence, phenotypes associated with mutations in sRNA genes can be difficult to recognize without additional information pointing toward a specific time, event, or condition to explore. In addition, sRNA and sORF genes are significantly smaller targets for traditional genetic mutation, and sRNA genes are not subject to frameshift or nonsense mutations (since they do not code for proteins), making it more difficult to generate loss-of-function alleles by general mutation. Therefore, it is perhaps not surprising that sRNA and sORF genes have not been identified readily in classical genetic screens or selections. More global approaches, such as mRNA expression profiling and proteomic studies, also have been extremely powerful for characterization of many ORFs and have led to identification of many cellular functions. However, once again such approaches have not been as successful for studies of sRNAs and sORFs, due in part to the fact that most commercially available microarrays do not include representation of these genes, as they have only been recently identified, and that general protocols used for proteomic studies, including mass spectrometry or two-dimensional gel electrophoresis, are not well suited for studying small proteins.
The two highlighted papers from Gisela Storz's group, the reports of Hobbs et al. (9) and Hemm et al. (7), tackle the problem of characterization and identification of function for sRNAs and sORFs, respectively, and present their progress in large-scale directed studies especially suited to the study of these genes. Importantly, the approaches described in these papers are also generally applicable to functional studies of any unknown gene in E. coli or other microorganisms.
In the first paper by Hobbs et al. (9), the authors demonstrate the power of analyzing DNA bar-coded mutants in mixed population studies by uncovering mutant phenotypes associated with several small RNAs and small proteins. For these experiments, strains were generated in which genes of interest were individually replaced by bar codes that were originally designed for construction of mutant libraries in yeast (5, 11). The bar code contains two 20-mer sequences (UP and DN) that are unique for each mutation generated. Flanking the specific UP and DN sequences are three additional sequence elements (Fig. (Fig.1,1, common sequences com1, -2, and -3) that are identical for all bar codes. These “common” sequences serve as priming sites for PCR amplification of the UP or DN bar codes from each strain (Fig. (Fig.1,1, arrows 1a and 1b for UP and 2a and 2b for DN). Thus, isolation of genomic DNA from a mixed population of mutant cells allows the amplification of the UP and DN barcodes using just two sets of primers (Fig. (Fig.1).1). The relative abundance of each bar code within the PCR products should be representative of the relative abundance of each strain within the population. In this way, each mutant can be independently tracked, even in a culture containing many different genotypes, by hybridization of amplified bar code DNA to a commercially available DNA array designed to detect the bar codes. Note that each bar code has two sequence elements (UP and DN) that can be analyzed independently.
The bar-coded mutant approach clearly facilitates analysis of a large number of mutants at once, providing quantitative representation of each strain present. However, in addition, it is the competition for growth within a mixed population that is likely to be critical for enhancing detection of subtle mutant phenotypes commonly associated with loss of sRNAs. For example, competitive growth experiments were required to reveal altered phenotypes for cells lacking 6S RNA in previous work (13). Thus, the use of bar-coded mutants dramatically expands the potential to search for phenotypes, even when individual mutant cells are a minority population in a complex mixture of cells, demonstrating an important advance in genetic approaches for identifying sRNA and small protein function.
Tracking individual DNA bar codes in a mixed cell population allowed the authors to specifically screen their library for strains that showed either increased or decreased resistance to their test conditions: acid or cell envelope stress. Previous studies showed that several sRNAs regulate synthesis of outer membrane proteins (15) and 70% of the small proteins are membrane localized (8), providing a rationale that the selected stress conditions (acid or envelope stress) might target small proteins and sRNAs. Indeed, this study revealed 15 genes, encoding 6 sRNAs and 9 small proteins, with previously unknown roles in these pathways. Surprisingly, one sRNA of known function, tmRNA (SsrA), was shown to play a previously unanticipated role in cell envelope stress, demonstrating that even knowing something about how an sRNA works does not fully elucidate its full physiological potential.
The library of 125 DNA bar-coded mutants also provides an important new resource for the E. coli K-12 toolbox. The 125 engineered mutations specify deletions of genes encoding 49 sRNAs, 50 small proteins of 50 amino acids or less, 13 small proteins of 50 to 75 amino acids, DppA (a target of GcvB sRNA), 8 known stress survival proteins, SmpA, GadE, TrpA, UspA, UspB, UspD, UspE, and OxyR, and two repetitive loci, ldr and sib. Recently, a collection of an additional 99 DNA bar-coded mutations (largely in genes associated with DNA repair) was also reported (12), creating an even larger resource for the community. One can imagine that expanding this library to include all ORFs would be a desirable genetic tool. However, creation of a comprehensive ORF library would be labor-intensive, particularly if the internal kanamycin cassette initially used to insert the bar code were removed from each strain to eliminate any possible polar effects on downstream gene expression, as was done for several of the strains in the Storz collection. Nevertheless, the ease of screening DNA bar-coded mutant libraries for fitness under any kind of growth or stress conditions should make construction of a mutant library of at least genes of unknown function a priority. In addition, this method could be easily adapted to high-throughput approaches using robots to simplify strain handling and dispensing, once appropriately sized strain libraries are generated. Finally, the approach described here of sampling a large number of mutants at once for competition for growth provides a nice complement to recent genetic approaches developed by Butland et al. (4) and Typas et al. (14), in which identification of gene function was guided by identification of interacting genes that either enhanced or inhibited growth.
In the second paper, by Hemm et al. (7), the authors describe a systematic approach for assaying and defining conditions that induce the synthesis of low-molecular-weight proteins, which have been difficult to study. The rationale is that understanding when a gene is expressed may give insight into gene function. For this goal, the authors generated strains in which sORF genes of interest have been modified to encode a small, in-frame tag (sequential peptide affinity [SPA] tag) to facilitate protein analysis with a commercially available antibody specific to the tag. Differences in expression of SPA-tagged proteins were analyzed under a variety of carbon source or stress conditions to assess when these small proteins might be functional. In addition, for several of these proteins, the mRNA transcripts were also studied and transcription factors were identified that mediate the observed regulation. Overall, the authors found that 21 of 51 proteins tested were induced under at least one condition tested (heat shock, oxidative stress, zinc limitation, oxygen limitation, acid or envelope stress, or changes in carbon source). Remarkably, the levels of over half of these proteins were increased during heat shock, suggesting that these small proteins may play a specific role in the response to increased temperatures.
In summary, the two highlighted papers from this issue demonstrate the successful use of two approaches that can be applied globally to learn more about the functions of sRNAs and small proteins. Extension of these approaches to additional growth or stress conditions should provide an even richer data set to fully comprehend the physiological function of these exciting gene products.
The views expressed in this Commentary do not necessarily reflect the views of the journal or of ASM.
Published ahead of print on 23 October 2009.