Advances in sequencing technology have ushered in a new era of bacterial genomics. At low cost, a single individual can sequence the complete genome of a bacterial isolate in less than a week. While this explosion in genome sequencing and gene discovery is breathtaking, it also serves as a reminder that for most bacteria, we do not know the function for most of the genes in the genome
[1]. Even in the model bacterium
Escherichia coli, which has been extensively studied for decades, there are hundreds of genes that are poorly annotated or entirely hypothetical
[2]. Therefore, it is critical that methods for systematically elucidating gene function in microbial genomes are developed
[3].
The current paradigm is that newly sequenced bacterial genomes go through a computational annotation pipeline that predicts gene structure and putative function. The latter is predicted from sequence homology to known gene families, protein domains, and characterized enzymes. However, given that most experimentally characterized genes derive from a small number of bacteria representing a tiny fraction of prokaryotic diversity, there are a large number of gene families that have never been experimentally characterized and hence computational annotations are useless beyond “conserved domain” or “conserved hypothetical.” Furthermore, due to weaker sequence conservation, computational annotations of gene function in microbial species get progressively less reliable the further one moves away from well-studied model bacteria such as
E. coli and
Bacillus subtilis [4]. Lastly, there are classes of genes (for instance, transcription factors
[5] and transport proteins
[6]) for which homology-based annotations are either vague or unreliable. Taken together, computational predictions alone, while a necessary first step towards genome annotation, are not sufficient to meet the growing challenge of assigning function to the millions of genes identified by DNA sequencing.
One attractive approach to characterize genes on a global scale is via the analysis of large-scale mutant collections. Mutants provide insight into gene function by providing a direct link between genotype and a cellular phenotype. By correlating genes with their phenotypes, a specific gene function can often be inferred
[7],
[8],
[9],
[10]. In the post-genome era, a number of microorganisms have been subjected to large-scale mutagenesis and phenotyping efforts. In bacteria, genome-wide mutant collections have been constructed for several bacteria, using either targeted methods
[11],
[12],
[13] or random transposon mutagenesis
[14],
[15],
[16].
One attractive approach to characterize genes on a global scale is via the analysis of large-scale mutant collections. Mutants provide insight into gene function by providing a direct link between genotype and a cellular phenotype. By correlating genes with their phenotypes, a specific gene function can often be inferred
[7],
[8],
[9],
[10]. In the post-genome era, a number of microorganisms have been subjected to large-scale mutagenesis and phenotyping efforts. In bacteria, genome-wide mutant collections have been constructed for several bacteria, using either targeted methods
[11],
[12],
[13] or random transposon mutagenesis
[14],
[15],
[16].
Regardless of how mutant strains are generated, a key challenge is the quantitative analysis of the mutant collections across the diverse range of conditions necessary to identify phenotypes for the majority of genes in the genome
[17]. The phenotypes of mutant collections can be assayed in high-throughput either as individual strains or in pooled, competitive fitness assays. In a recent example of the former approach, the individual mutant strains of the
E. coli KEIO deletion collection were assayed in hundreds of growth conditions using an agar-based colony size assay
[10]. At least one phenotype was identified for ~50% of
E. coli genes using this individual strain assay. Conversely, the use of pooled assays to measure mutant phenotypes is best exemplified in
Saccharomyces cerevisiae. Each yeast deletion strain contains a unique DNA tag (or barcode) sequence, that enables the pooling and competitive fitness profiling of thousands of strains in parallel
[8]. Similar to individual strain assays, competitive pool assays provide a relative measure of strain fitness. Nevertheless, the use of competitive fitness assays with DNA tags has two primary advantages. The first is that genome-wide mutant collections are pooled in a single tube, thereby simplifying experimental setup, increasing throughput, and reducing issues related to strain contamination. More importantly, the tag-based pooled fitness assay provides a highly quantitative measure of strain fitness regardless of whether a microarray
[18],
[19] or sequencing
[20] is used to measure tag abundance.
Shewanella oneidensis MR-1 is a Gram-negative γ-proteobacterium isolated from freshwater lake sediment
[21]. Like most other members of the
Shewanella genus,
S. oneidensis MR-1 (hereafter abbreviated MR-1) can use a wide variety of terminal electron acceptors, including both soluble and solid metals. As such, MR-1 has received attention for its potential roles in the bioremediation of heavy metals and energy generation via fuel cells
[22]. The computationally annotated MR-1 genome contains 4,318 protein-coding genes on its main chromosome and an additional 149 protein-coding genes on a single megaplasmid
[23],
[24]. Based on orthology relationships (bidirectional best BLAST hits), MR-1 shares 1,639 genes (37%) with the γ-proteobacterium
E. coli. A total of 1,655 genes (37%) in the MR-1 genome are annotated as hypothetical, with 83% (1,371) of these genes not having orthologs in
E. coli.
Here we describe the functional characterization of the MR-1 genome via the generation and phenotypic analysis of a large transposon mutant collection. Using a DNA tag-based pooled fitness assay, we assayed mutant fitness for 3,355 nonessential genes in 121 diverse metabolic, redox, stress, survival, and motility conditions. In addition to identifying phenotypes for over 2,000 genes, we demonstrate that mutant fitness profiles can be used to infer specific functions for genes and operons, a subset of which we confirm experimentally. Furthermore, we demonstrate that the correlation between gene expression and mutant fitness is poor in bacteria, thus underscoring the need to complement transcriptomics with mutant phenotyping. Our strain collection and fitness dataset are valuable resources for studying microbial metal reduction and for microbiology in general, given that many previously uncharacterized MR-1 genes have orthologs in diverse bacteria.