To keep pace with exploding sequence information, cost-effective, high-throughput phenotyping technologies must be developed. Here we show that phenomic profiling in
E. coli fulfills this goal. Our dataset is of great utility in identifying the function of orphan genes. Three cases (
marB,
lpoA,
lpoB) were investigated here or in a study based on this dataset (
Typas et al., Cell, 2010), and we are actively pursuing functional discovery of numerous (>20) orphan genes, as well as annotated genes with previously unsuspected roles in collaboration with others. Since >25% of the orphan genes are highly correlated to an annotated gene (r≥ 0.5), this dataset provides a rapid method for function discovery.
An important finding is that the most responsive orphan genes tend to be narrowly distributed among bacteria. Interestingly, our results mirror initial observations from human microbiome studies. These studies found that: a) roughly half of the functions encoded in the minimal gut metagenome (ubiquitously present in all 124 individuals screened) are both unknown and of limited evolutionary conservation (
Qin et al., 2010); b) across 4 pan-genome species analyzed, the vast majority of non-common genes were of either unknown function (~70%) or unique family members of functions that were part of the core gene set (
Nelson et al., 2010). The latter are probably species-specific additions to conserved biological processes of the pan-genome. Together these studies argue that when computational methods based on gene conservation fail, large-scale phenomic analyses can be a second tier for assigning function. To make this approach a reality, low cost methods for developing deletion libraries must be developed (
Goodman et al., 2009). Single-gene deletion ordered libraries are currently available for only a handful of organisms [(
Cameron et al., 2008;
de Berardinis et al., 2008;
Gallagher et al., 2007;
Goodman et al., 2009;
Kim et al., 2010;
Liu et al., 2008;
Noble et al., 2010) and references in (
Barker et al., 2010)], but advances in transposon mutagenesis make it feasible to create ordered mutant libraries in most organisms. In
E. coli, expansion of this work will rest on the ability to assess additional phenotypes through deeper exploration of phenotypic space. The greatest potential resides at the intersection of screening more diverse stresses and incorporating additional cellular readouts. Colorimetric readouts would enable measurement of transcriptional activity or biofilm formation on solid agar surfaces, and represent an immediate potential advance for phenomic profiling. High-throughput microscopy would provide a new avenue for such approaches (
Werner et al., 2009).
Our dataset provides information on a substantial collection of antibiotics/antimicrobial compounds that cover a broad spectrum of drug targets, structural classes and drug generations, providing a platform for future studies focused on natural products or antimicrobials with unknown targets. Our dataset can also provide a platform for studying the mechanism behind drug interactions (
Yeh et al., 2009), as shown here for the case of sulfonamides and TMP. Understanding the mechanism underlying known drug interactions may help to predict novel interactions and manipulate existing drug combinations to increase their effectiveness in the clinic.
In summary, we have generated a valuable resource for microbiologists studying a wide range of biology, and demonstrated the numerous and diverse applications of this dataset to infer information both on gene and drug function. As the most comprehensive prokaryotic chemical genomic study to date (3979 strains × 324 conditions), our dataset will serve as a base for future studies that aim to increase information and/or resolution on both the gene and drug fronts. We hope that the usefulness of this resource will trigger analogous studies in other organisms, bringing us a step nearer to closing the gene sequence-function gap.