|Home | About | Journals | Submit | Contact Us | Français|
The explosion of sequence information in bacteria makes developing high-throughput, cost-effective approaches to matching genes with phenotypes imperative. Using E. coli as proof of principle, we show that combining large-scale chemical genomics with quantitative fitness measurements provides a high-quality data set rich in discovery. Probing growth profiles of a mutant library in hundreds of conditions in parallel yielded > 10,000 phenotypes that allowed us to study gene essentiality, discover leads for gene function and drug action, and understand higher-order organization of the bacterial chromosome. We highlight new information derived from the study, including insights into a gene involved in multiple antibiotic resistance and the synergy between a broadly used combinatory antibiotic therapy, trimethoprim and sulfonamides. This data set, publicly available at http://ecoliwiki.net/tools/chemgen/, is a valuable resource for both the microbiological and bioinformatic communities, as it provides high-confidence associations between hundreds of annotated and uncharacterized genes as well as inferences about the mode of action of several poorly understood drugs.
Before the physical basis of genes was understood, associating phenotypes with a heritable unit laid the foundation of modern genetics. Following discovery of the genetic code, linking a phenotype to the responsible gene remained the most expeditious way to unravel gene function. With the explosion of sequence information, the balance has shifted. We now have many genes of unknown function. To capitalize on the burgeoning sequence bank, it is imperative to develop high-throughput technologies that link genes to phenotypes and expedite discovery of gene function. This is particularly true for prokaryotes, which represent a major fraction of the sequenced genomes and are in the forefront of metagenomic efforts (Qin et al., 2010).
Chemical and environmental perturbations have traditionally linked phenotypes to genotypes through forward genetic screens, but reverse genetic approaches are being increasingly utilized (Barker et al., 2010). Phenotype microarrays utilize a high-resolution readout of cellular respiration to evaluate fitness of a strain in hundreds of conditions (Bochner, 2009). This approach is appropriate for studying a few strains, but is difficult to expand to genome-scale screens. In pooled growth competitions, thousands of strains are assayed in a single culture environment. Fitness values are derived from measuring strain abundance in a test relative to control condition (Giaever et al., 2004; Girgis et al., 2009; Hillenmeyer et al., 2008; Hoon et al., 2008; Lee et al., 2005; Pan et al., 2004; Parsons et al., 2006; Warner et al., 2010; Xu et al., 2007). These approaches are very efficient, but competition between strains in each condition makes it difficult to determine relative strain growth across conditions (Girgis et al., 2009), especially for strains that grow slowly even in the absence of perturbation (Lee et al., 2005). Arraying mutant strains on solid media allows independent evaluation of strain fitness, but has been used only for low-resolution measurements of entire libraries (Liu et al., 2010; Tamae et al., 2008) or for essential genes (Pathania et al., 2009). High-throughput genetic interaction studies, pioneered in yeast (Schuldiner et al., 2005; Tong et al., 2001), are complementary to chemical genomics approaches. Such analyses quantitatively measure colony growth of double mutants in high-density format on agar surfaces, and have led to numerous successes in identifying gene function and network organization (Beltrao et al., 2010). Similar methodology has been developed for E. coli (Butland et al., 2008; Typas et al., 2008).
We use E. coli to illustrate the power of applying the high-resolution quantitative fitness measurements of genetic interaction analysis to high-throughput phenotypic analysis of culturable microbes. “Phenomic profiling” provides a quantitative description of the response of all single gene deletions to physiologically relevant stresses and drug challenges. By profiling ~4000 genes in >300 perturbations, we identified thousands of phenotypes and a diverse suite of conditionally essential genes. This approach provides new insights into the chromosome organization, functional landscape and evolutionary trajectory of E. coli. It facilitates high confidence association of genes of unknown function to those of known function, as highlighted by discovery of the role of a gene involved in multiple antibiotic resistance in this manuscript and identification of two novel lipoproteins essential for peptidoglycan synthesis (Typas et al. Cell, 2010). Finally, the degree to which various gene deletions alter toxic drug effects has lead to powerful insights regarding drug mode-of-action (Kohanski et al., 2008) and we demonstrate that our analysis generates numerous leads concerning drug function.
We determined quantitative growth scores for the Keio single-gene deletion library (Baba et al., 2006); essential gene hypomorphs [C-terminally tandem-affinity tagged (Butland et al., 2005) or specific alleles]; and a small RNA/small protein knockout library (Hobbs et al., 2010) in conditions representing the range of stresses E. coli encounters. Mutant strains arrayed in high-density on agar plates (1536 colonies/plate) were grown in 324 conditions covering 114 unique stresses (Figs. (Figs.1A1A & S1A, Table S1). Colony sizes were analyzed and converted to a drug-gene score using an approach developed for quantifying genetic interactions (see Experimental Procedures). More than half of the conditions were antibiotic/antimicrobial treatments (Fig. 1A). By using a sub-inhibitory concentration series that maximally inhibited growth of the wildtype (wt) strain ≤50%, we were able to search for specific drug-gene interactions (Fig. S1A), and reduce the ability of spontaneous suppressor mutations to overtake the colony. Two independently derived clones of each mutant strain were analyzed (for sRNA mutants, a single isolate was arrayed twice) and screens were performed at least twice, enabling scores to be based on 4-6 independent measurements. Correlation between replicate colony size measurements was very high (r=0.77, Fig. 1B). The final dataset (Table S2) was comprised of scores for the 3979 mutant strains passing quality control (e.g. proper normalized colony size distribution and replicate reproducibility; see Experimental Procedures). The entire dataset is available in an interactive, searchable format and as a flat file on the E. coli wiki website (at http://ecoliwiki.net/tools/chemgen/).
The entire matrix (3979 mutants × 324 conditions) was subjected to 2-D hierarchical clustering (Fig. 1C). Drugs with similar effects cluster on the X-axis; mutants that behaved similarly cluster on the Y-axis. Notably, concentrations of the same drug, drugs of the same family and/or similar conditions clustered together as did mutants of genes known to be part of the same operon, biological pathway and/or protein complex. Zoomed insets of our clustergram illustrate examples. Genes in the rfa operon (rfaG, rfaP, rfaQ, rfaB and rfaI), which encodes enzymes that synthesize the inner and outer lipopolysaccharide (LPS) core strongly cluster together with 3/4 genes responsible for the synthesis of one of the sugar building blocks, ADP-L glycero-β-D-manno heptose (rfaD, rfaE, lpcA). Importantly, clustering reflects their shared sensitivities to a concentration series of compounds known to perturb the envelope integrity of the cell, consistent with the role of LPS. dsbA and dsbB, encoded in different operons, also cluster. The DsbA/DsbB complex generates disulfide bridges in the periplasm.
The response of each mutant strain across all conditions is denoted as its “phenotypic signature”. High correlation between two phenotypic signatures is highly predictive of known indicators of functional connection between genes. Gene pairs with correlation coefficients (r) between 0.6 and 0.8 (p-value <10−34) are more than 100-fold enriched for genes sharing common operon membership, and 150-fold enriched for genes with known protein interactions determined from low-throughput experiments (www.ecocyc.org, Fig. 1D). This benchmarking analysis indicates that our phenomics dataset is biologically meaningful. Correlated phenotypic signatures also reproduce connections between curated biological pathways (Fig. S1B). For example, electron transfer components cluster tightly (e.g. nuo genes encoding NADH dehydrogenase I complex; Fig. 1C). Their clustering reflects high sensitivity to membrane-perturbing stresses including detergents, dyes and metals, and increased resistance to aminoglycosides, in agreement with early studies that illustrate decreased aminoglycoside uptake in the absence of a fully functional electron transport chain (Girgis et al., 2009; Taber et al., 1987). All three examples described in Fig. 1C are consistent with the expectation that highly correlated phenotypic signatures are biologically meaningful (r≥ 0.6 -0.8).
A central goal of this study was to systematically evaluate the impact of every gene deletion on E. coli fitness in diverse environments, as few gene deletions in E. coli have robust reported growth phenotypes and only 8% of the genes are essential in rich media (Baba et al., 2006; Yamamoto et al., 2009). We used a statistical method to define a reliable phenotype. Briefly, we standardized the interquartile range of the distribution of scores for each screen and then determined the probability that each condition-gene interaction represented a true phenotype using a normal cumulative distribution function (see Experimental Procedures). Using a 5% probability that these phenotypes arose by chance as a cut-off (false discovery rate (FDR) ≤5%), 49% of all strains tested (1957/3979 strains; Fig. 2A) had one or more phenotypes. We refer to these genes as the “responsive genome”. This “responsive genome” is a work in progress, as it is limited to genes whose removal causes growth phenotypes in response to the stresses tested. Expanding the stresses tested and/or the readout (e.g. motility) will certainly increase this number (Girgis et al., 2007). A cumulative plot of the number of individual phenotypes per strain shows that very few genes have many phenotypes. Multi-Stress Responsive (MSR) strains (≥30 phenotypes; Table S3) participate in many cellular processes, suggesting that our stresses encompassed diverse cellular challenges (Fig. 2B). With a stringent cut-off of 5% FDR, the maximum number of phenotypes from a single screen was 173 (~4% strains; Fig. S2A), and the total number of phenotypes (13497) represent ~1% of all condition-gene pairs tested. Overall, 80% of the phenotypes were negative (gene deletion more sensitive) and 20% positive (gene deletion more resistant), consistent with recent genetic interaction analyses in S. cerevisiae (Fiedler et al., 2009) and S. pombe (Roguev et al., 2008). This suggests that removal of a gene product is more likely to decrease than enhance resistance to stress (Fig. S2B). In summary, our analysis captured numerous highly specific condition-gene responses. Clearly, this dataset can be used to assign more phenotypes at a lower confidence level. Indeed, a recent chemical genomics dataset in S. cerevisiae reported phenotypes for more than 95% of gene deletions tested, many stemming from a handful of severe stresses (Hillenmeyer et al., 2008).
Conditionally essential (CE) genes are essential for growth in a particular condition. Deletions of such genes show very small colony sizes and high confidence negative scores in particular conditions (see Experimental Procedures). We identified 197 CE genes, comprised of auxotrophs, which exhibit no growth in minimal media, and rich-media CE genes, which exhibit no growth in at least one rich media-based stress (Table S4). Importantly, our dataset had 70% overlap with a previous study of Keio Collection auxotrophs (Fig 2C, (Joyce et al., 2006) despite significant experimental differences (e.g. growth in liquid vs. solid media). Many of the remaining 30% were extremely sick, but above the stringent threshold we used to define no growth. We also identified 23 additional auxotrophs specific to alternative carbon/nitrogen sources not tested in the Joyce et al. study.
Genes essential for survival in natural environments are likely to extend beyond those required for laboratory growth, and could be targets for new antimicrobials (D’Elia et al., 2009). The 116 rich-media CE genes we identified (Fig. 2C) result from physiologically relevant stresses, and increase the current number of essential genes by roughly 30%. Interestingly, many of these gene products are located in the outer cell envelope (Fig 2D), a selective permeability barrier for gram-negative bacteria that is severely underrepresented for known essential genes (Fig. 2D). Many of the stresses generating CE phenotypes are part of the natural environment of E. coli, e.g. bile salts (Table S5), indicating that these genes are likely indispensable for E. coli to survive in vivo. Similarly, using the largest metagenomic dataset to date, Qin et al. reported that envelope-specific functions, such as adhesion, were commonly required for life in the gut (Qin et al., 2010).
A key motivation for our study was to provide phenotypes for mutants of genes without functional annotation. Using a recently assembled compendium of such “orphan genes” in E. coli (Hu et al., 2009), we find that the fraction of mutant orphan genes with phenotypes is close to that of annotated genes (Fig. 3A), but the former tend to have fewer phenotypes, indicating the power of phenomic analysis for identifying their phenotypes. Importantly, the phenotypic profiles of >25% of all orphan genes correlate strongly with those of annotated genes (r ≥ 0.5; Fig. 3B & Table S6), providing high confidence leads (p-value <10−22) for discovery of their function. As these orphans are tied to a wide variety of cellular processes (Fig. 3C), the dataset will be of broad utility.
A small fraction of orphan gene knockouts have many phenotypes. Whereas annotated genes responsible for many phenotypes are broadly distributed among bacteria, the most responsive orphans tend to be narrowly distributed (Fig. 3D). This result suggests that evolutionary conservation is not a reliable indicator of the importance of an orphan gene to the organism, and that annotating them solely by homology has limitations. Such orphans may have evolved to fulfill an important but specialized function required by the niche of the organism. In support of this idea, a multi-responsive orphan identified in this study (lpoB) is restricted to enterobacteria and regulates peptidoglycan synthesis, a conserved process ubiquitous among bacteria (Typas et al.; Cell, 2010).
Both correlated phenotypic signatures (Fig. 1C&D; Typas et al.; Cell, 2010) and anticorrelated phenotypic signatures have functional significance. For example, the phenotypic signatures of deletions of a transcriptional repressor and important target genes are likely to be anticorrelated. We find that marR− and marB− were highly anticorrelated with acrB−, whereas marR− and marB− were highly correlated (Fig. 4A). marB is a gene of unknown function in the multiple antibiotic resistance operon (marRAB), which also includes the operon repressor, marR, and its activator, marA. MarA also activates genes involved in antibiotic resistance, most importantly acrAB, encoding the major antibiotic efflux pump in E. coli (Fig. 4B; (Martin and Rosner, 2002). We explored whether MarB, like MarR, repressed MarA. Because of the inherent problems of high-throughput collections (suppressors, gene duplications, cross-contamination), we always apply stringent quality control procedures to any follow-up investigations including PCR-validation of Keio isolates, and verification that retransduced strains maintain their phenotype. As mar is a hotspot for adaptive mutations, we also sequenced the entire operon and promoter region of single deletions and the double mutants we constructed.
Deletion of either marB or marR resulted in higher MarA levels, and the double marRB mutant showed additive effects on MarA transcript level (Fig. 4C) and protein level (data not shown). These effects were observed in both Kan-marked and clean deletions (Kan cassette excised, leaving an 82nt scar). The ΔmarR strain exhibits ~2X more increase in MarA transcript levels than marR::kan (data not shown), arguing for a small polar effect of the cassette. Both marB::kan and ΔmarB exhibit the same 2-fold increase in MarA levels (data not shown). These data suggest that MarB represses MarA independently of MarR. MarB does not have the signature of a DNA-binding protein, suggesting it acts post-transcriptionally. MarA level is controlled by the Lon protease (Griffith et al., 2004), but lon− and marB− effects are additive, indicating that MarB does not function through Lon (data not shown). MarA has been proposed to scan for activation sites while bound to RNA polymerase; by direct binding to either partner, MarB could disrupt complex formation. Aternatively, MarB may function in the periplasm. As MarB has a predicted periplasmic signal sequence, it could titrate an activating ligand for mar (e.g. salicylate).
Although mar is highly studied (~200 primary publications; Pubmed), our screen provided the first lead for MarB function. MarA targets approximately 40 genes, many of which are also co-regulated by the SoxS and Rob activators, with similar DNA-binding motifs as MarA (Martin et al., 2008; Martin and Rosner, 2002). The rules of engagement are poorly understood, but each activator responds to different environmental cues and overexpression of each leads to distinct phenotypes (Warner and Levy, 2010). It is likely that tight control of each activator impacts on the final gene expression output, which is crucial for cellular proliferation. MarB may be an important player in fine-tuning the expression of MarA, especially since it is a conserved member of the mar operon, which has only recently emerged in selected enterobacteria. Strong evidence for the importance of mar operon regulation in these organisms is that mar is a hotspot for mutations conferring higher drug resistance in E. coli (Nicoloff et al., 2007; Nicoloff et al., 2006).
Tetrahydrofolate (THF) and its methyl/formylated derivatives are key molecules in all kingdoms of life for one-carbon metabolism. THF is used to synthesize glycine, methionine, purines and dTTP, in a process that leads to recycling of the THF species back to THF or dihydrofolate (DHF) (Fig. 5A). The bacterial THF biosynthesis pathway is targeted by two drugs: Sulfonamides (Sulfa) target FolP, and Trimethoprim (TMP) targets FolA (Fig. 5A). Dual inhibition by Sulfa and TMP is strongly synergistic, and therefore these drugs are almost exclusively administered in combination for treatment of ear, urinary tract and bronchial infections. Despite extensive clinical use and years of laboratory investigation, we lack a complete mechanistic understanding of why these drugs are strongly synergistic. A network feature identified by phenomic profiling could contribute to synergy.
We find that the two drug classes have major phenotypic differences. Sulfa and TMP treatments are highly correlated within their class (r=0.57 for Sulfa; 0.67 for TMP), but poorly correlated with each other (r=0.15 +/− 0.04), just slightly more than the correlation observed between all screens (r=0.025 +/− 0.12). Thus, subinhibitory TMP and Sulfa treatments have fundamentally different effects on the cell, even though both partially block THF biosynthesis. Importantly, removing enzymes acting directly downstream of THF production resulted in opposite drug sensitivities: the serine hydroxymethyltransferase mutant (glyA::kan) was sensitive only to TMP; conversely, glycine cleavage (GCV) mutants (gcvP::kan, gcvH::kan, gcvT::kan) were sensitive only to Sulfa (Fig 5B). The mutant results were reproduced in liquid culture (Fig. 5D), where glyA− TMP sensitivity is manifested as a growth rate phenotype (left panel), and gcvP− sulfamethizole (SMT) sensitivity is registered as a low stationary phase density (right panel).
GlyA and GCV lie on opposite sides of a branched pathway that converts THF to 5,10-methylene THF (5,10-mTHF; Fig 5A). As glyA and gcv mutants exhibit synthetic lethality, they are the only routes to production of this essential metabolite (Fig. 5C). A simple explanation for the differential responses of glyA− and gcvP− is that 5,10-mTHF is predominantly produced via different branches under each drug treatment. A corollary is that combination drug treatment inhibits both branches, resulting in synergistic limitation for 5,10-mTHF, before the pools of THF are depleted. In support of this idea, despite the increased sensitivity of glyA− and gcvP− to single drugs, these strains grew no more poorly than wt under the drug combination (Fig. 5D). Thus, genetically eliminating either branch of the pathway reduced but did not eliminate synergy. The downstream biosynthetic reactions are also differentially affected by TMP and Sulfa (Fig. S3A), and we are currently testing whether they partially account for the residual synergy. Streptococcus pneumoniae lacks the GCV system and exhibits significantly less drug synergy than E. coli across different growth conditions (Fig. 5E & data not shown). We performed our comparison using concentrations of TMP and SMT that caused the same relative growth defect in each species (Fig. 5E). These data together support the hypothesis that simultaneous inhibition of the branched pathway for production of 5,10-mTHF contributes to the observed anti-folate synergy in E. coli.
Our data do not indicate whether differential effects of TMP and Sulfa on GlyA and GCV result from differential inhibition of expression or activity, or the intrinsic properties of each enzyme. We favor the idea that differential metabolite accumulation and subsequent feed-forward enzymatic regulation make a contribution to the distinct cellular responses to these two drugs. Recent metabolomic flux analyses indicate that high doses of TMP lead to accumulation and depletion of select metabolites, as well as to protein-level regulation of portions of the network (Kwon et al., 2010; Kwon et al., 2008). Although a comparable analysis has not been performed for Sulfa drugs, deletion of the predicted 5-formyl-THF cycloligase, ygfA, which likely degrades 5-formyl-THF (Jeanguenin et al., 2010), clusters tightly with the gcv mutants, and exhibits sensitivity only to Sulfa drugs (Fig. S3A). That 5-formyl-THF degradation is critical only under Sulfa stress suggests differential accumulation (or requirement) of THF species under Sulfa and TMP treatments. 5-formyl-THF is a known inhibitor of several enzymes in the THF network of other organisms (Field et al., 2006; Stover and Schirch, 1993), and could act as an effective protein-level regulator. Similarly, a strain lacking a predicted alanine racemase, yggS, is sensitive only to Sulfa; D-alanine is known to inactivate GlyA (Schirch et al., 1985), and yggS− and glyA− form a synthetic lethal pair (Fig. S3B). Thus, the different cellular responses to these two drugs may be due in part to metabolite-based enzymatic regulation. An extension is that the synergy of combination therapy could rest primarily on complementary inhibition of different one-carbon biosynthesis reactions, and therefore recycling of THFs. This model would allow for synergy even with the expected additive limitation of THF production.
In summary, our results illustrate the power of phenomic profiling to yield insights into drug action and the ability of a networks view to provide new paradigms for analysis of drug interaction mechanisms, which can facilitate hypothesis-driven research on drug interactions (Bollenbach et al., 2009). This type of analysis may be generally useful in predicting drug synergies, and in explaining variable drug-drug interactions across species.
The E. coli genome is encoded on a single, circular chromosome, with a single origin of replication, oriC. Essential genes are biased to the plus (+) strand, where transcription proceeds in the same direction as DNA replication. This may avert head-on collisions between RNA and DNA polymerases that would result in aborted transcripts, truncated, or frame-shifted proteins (Rocha and Danchin, 2003). Here we show that responsive and CE genes, which are important for optimal growth of the organism, also show + strand bias (Fig. 6A). Indeed, the weighted responsive genome (responsive genes weighted by number of phenotypes identified) is heavily biased to the + strand, indicating great selective pressure to place genes important for rapid growth on the + strand. Conversely, the non-responsive genome is biased to the minus (−) strand. As our approaches expand to incorporate additional phenotypic readouts more important for cells with reduced division and DNA replication (e.g. biofilm formation), the + strand bias of responsive genes will presumably be reduced.
The chromosome is massively compacted in the cell to create the nucleoid, which is thought to contribute significantly to the organization of gene expression (Travers and Muskhelishvili, 2005; Vora et al., 2009). Chromosomal loci have spatial addresses in the cell, corresponding closely to their chromosomal position (Toro and Shapiro, 2010). Additionally, highly expressed genes associated with transcription and translation are located near the origin of replication (oriC), presumably to benefit from the “gene dosage” effect created when rapidly growing cells initiate multiple rounds of DNA replication per division (Couturier and Rocha, 2006). Projecting the spatial distribution of the responsive genes onto the circular chromosome (Fig. 6B, black trace) provides us with a snapshot of the E. coli genome from a functional perspective. This projection is based on a 100kb sliding window and therefore captures organization above the operon level (Fig. S4A; Experimental Procedures). A pattern of alternating peaks and valleys is clearly evident, indicating that responsive genes cluster spatially into large chromosomal regions separated by regions generally devoid of responsive genes. “Valleys” are comprised of spatially separated operons often transcribed from different strands, indicating that low responsiveness is a regional characteristic rather than an artifact due to large non-responsive operons. Our finding of clustering above the operon level is in accord with other studies showing that gene expression is broadly correlated across certain regions of the chromosome (Carpentier et al., 2005; Jeong et al., 2004).
The responsive genome is most enriched around oriC, which has the highest concentration of responsive genes (Fig. 6B, black trace). This area is also enriched for the most responsive genes (Fig. S4B), and for conditionally essential (CE) genes (Fig. 6B, red trace), providing strong support for the idea that the E. coli chromosome tends to store genes of high functional importance near the oriC. In contrast, the terminus region is relatively devoid of responsive genes (Fig 6B, black trace), has a paucity of broadly conserved genes (Fig 6C, red trace) and a corresponding enrichment for genes restricted to γ-proteobacteria (Fig 6C, blue trace). We postulate that the terminus region contains newly acquired genes that have yet to fully integrate into the cellular network, and tend to lack phenotypes. This could enable cells to avoid unnecessarily high expression of such genes as a consequence of the gene-dosage effect. Should this result prove true across bacterial species, it could point to a general organizing principle of circular chromosomes.
“Drug-centric” analyses are more complex than “gene-centric” analyses. Whereas genes mostly participate in a single biological process, many parameters are required to describe drug action: uptake, primary/secondary targets, efflux. Therefore, pairwise relationships between drugs are more complex than those between genes. For example, two drugs may cluster based on drug uptake, even though their primary targets differ. In addition, drug signatures are an order of magnitude larger than gene signatures (3979 vs. 324). To reduce the complexity of drug signatures, we calculated Drug-Gene Ontology (GO) scores, which represent the probability that a given GO group specifically interacts with a given drug (i.e. number of phenotypes associated with genes in the GO group vs. across the entire dataset). We used these Drug-GO scores to explore drug mode-of-action through a network-based clustering strategy (see Experimental Procedures). The position of drugs in the network (Fig. 7A) is based both on the magnitude of their Drug-Gene Ontology (GO) scores (gray) and on Drug-Drug correlations (yellow). Of the 719 significant Drug-GO interactions (p-value ≤ 10−3), which include 64 drugs and 218 GO groups, 657 were negative and only 62 were positive. Thus, disrupting a linked biological process was very likely to increase drug sensitivity (Table S7). Drug-Drug correlations increased the resolution of the network and captured drug similarities that escaped the Drug-GO analysis.
We found that drugs with the same cellular target tend to cluster. For example, drugs targeting DNA (orange) fall in the lower right, those targeting THF-biosynthesis (light green) fall on the bottom edge and those targeting peptidoglycan (PG; purple) predominantly cluster in the upper left. Interestingly, β-lactams cover the center of the PG cluster, whereas drugs targeting other steps of PG synthesis are located at the periphery. The correlation coefficients between β-lactams reveal that the similarity of their phenotypic signatures is related to their respective primary target Penicillin Binding Proteins (Fig. S5A). Interestingly, known synergistic double drug combinations (TMP/sulfonamides and mecillinam/cefsulodin) occupy spaces distinct from either individual drug, arguing that the combination elicits a different cellular response from the individual drugs. It will be interesting to determine whether this holds true for antagonistic or neutral interactions or whether these combinations elicit responses closer to one or both drugs.
Importantly, specific Drug-GO interactions suggest hypotheses for mechanism of action even for well-studied drugs. Quinolones inhibit DNA gyrase by trapping it as a quinolone adduct, whose mechanism of resolution is poorly understood (Drlica et al., 2008). One GO category, “cellular DNA catabolic process” (xseAB) selectively and specifically (p-value=10−6) interacted negatively with all four quinolones screened (Fig (Fig7B7B & S5B), expanding on a previous report of xseAB mutant sensitivity to fluoroquinolones (Tamae et al., 2008). We suggest that XseAB (exonuclease VII) is the enzyme that cleaves quinolone-bound DNA gyrase from the DNA to allow repair to proceed, a possibility we are currently exploring.
Our drug network also provides clues for the mode of action of poorly described drugs, and, conversely suggests that additional factors are required to explain the action of other drugs. Nitrofurantoin (NTF; Fig. 7B) is reported to have a multi-faceted impact on cells (McOsker and Fitzpatrick, 1994; Tu and McCalla, 1975), but our data suggests its cytotoxic effects reflect DNA damage, as it causes lesions requiring nucleotide-excision repair (NER) and activates the SOS response. NTF is the only DNA-targeting drug requiring NER, but not double-strand break repair, suggesting that its primary toxic lesion is associated with the replication fork (Fig. S5B). Additionally, our network analysis validates the idea that indolicidin, a neutrophil antimicrobial peptide, mediates its effects by compromising the inner membrane permeability of E. coli in a manner similar to the proton motive force uncoupler, CCCP (Falla et al., 1996). Finally, phleomycin and bleomycin do not cluster with DNA response drugs, suggesting they have broader cellular impact (Hecht, 2000; Yeh et al., 2006), from inducing DNA scissions (Giloni et al., 1981). These insights suggest that this phenomic dataset is a rich source for discovery of drug function and interrelationships.
To keep pace with exploding sequence information, cost-effective, high-throughput phenotyping technologies must be developed. Here we show that phenomic profiling in E. coli fulfills this goal. Our dataset is of great utility in identifying the function of orphan genes. Three cases (marB, lpoA, lpoB) were investigated here or in a study based on this dataset (Typas et al., Cell, 2010), and we are actively pursuing functional discovery of numerous (>20) orphan genes, as well as annotated genes with previously unsuspected roles in collaboration with others. Since >25% of the orphan genes are highly correlated to an annotated gene (r≥ 0.5), this dataset provides a rapid method for function discovery.
An important finding is that the most responsive orphan genes tend to be narrowly distributed among bacteria. Interestingly, our results mirror initial observations from human microbiome studies. These studies found that: a) roughly half of the functions encoded in the minimal gut metagenome (ubiquitously present in all 124 individuals screened) are both unknown and of limited evolutionary conservation (Qin et al., 2010); b) across 4 pan-genome species analyzed, the vast majority of non-common genes were of either unknown function (~70%) or unique family members of functions that were part of the core gene set (Nelson et al., 2010). The latter are probably species-specific additions to conserved biological processes of the pan-genome. Together these studies argue that when computational methods based on gene conservation fail, large-scale phenomic analyses can be a second tier for assigning function. To make this approach a reality, low cost methods for developing deletion libraries must be developed (Goodman et al., 2009). Single-gene deletion ordered libraries are currently available for only a handful of organisms [(Cameron et al., 2008; de Berardinis et al., 2008; Gallagher et al., 2007; Goodman et al., 2009; Kim et al., 2010; Liu et al., 2008; Noble et al., 2010) and references in (Barker et al., 2010)], but advances in transposon mutagenesis make it feasible to create ordered mutant libraries in most organisms. In E. coli, expansion of this work will rest on the ability to assess additional phenotypes through deeper exploration of phenotypic space. The greatest potential resides at the intersection of screening more diverse stresses and incorporating additional cellular readouts. Colorimetric readouts would enable measurement of transcriptional activity or biofilm formation on solid agar surfaces, and represent an immediate potential advance for phenomic profiling. High-throughput microscopy would provide a new avenue for such approaches (Werner et al., 2009).
Our dataset provides information on a substantial collection of antibiotics/antimicrobial compounds that cover a broad spectrum of drug targets, structural classes and drug generations, providing a platform for future studies focused on natural products or antimicrobials with unknown targets. Our dataset can also provide a platform for studying the mechanism behind drug interactions (Yeh et al., 2009), as shown here for the case of sulfonamides and TMP. Understanding the mechanism underlying known drug interactions may help to predict novel interactions and manipulate existing drug combinations to increase their effectiveness in the clinic.
In summary, we have generated a valuable resource for microbiologists studying a wide range of biology, and demonstrated the numerous and diverse applications of this dataset to infer information both on gene and drug function. As the most comprehensive prokaryotic chemical genomic study to date (3979 strains × 324 conditions), our dataset will serve as a base for future studies that aim to increase information and/or resolution on both the gene and drug fronts. We hope that the usefulness of this resource will trigger analogous studies in other organisms, bringing us a step nearer to closing the gene sequence-function gap.
Experimental procedures are partially elaborated in the text and figure legends, and are fully explicated in supplementary material.
We thank R. Kishony, J. Weissman, A. Hochschild and RG Martin for critically reading this manuscript; J. Hu and P. Thomas for hosting these data on E. coli Wiki; C. Raetz for CHIR-090 and M. Gottesman for Bicyclomycin; H. Mori for the Keio Collection; G. Storz for sharing the sRNA deletion library prior to publication; J.Greenblatt and A. Emili for SPA-tagged alleles; W. Margolin, R. Misra, T. Silhavy and B. Palsson for mutants; T. Baker and R. Sauer for controllable degradation plasmids. This work was supported by NIH R01 GM085697 & ARRA GM085697-01S1 to CAG and NJK; NIH R01 GM036278 to CAG, NIH K99GM092984 to AT, NIH AI060744 to MEW, and NIH GM078338 to SS; NIH F31 DE020206-01 and NIH T32 DE007306 (RJN support); European Molecular Biology Organization long-term fellowship (to AT); and Human Frontier Science Program Long-Term Postdoctoral Fellowship (to PB).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.