|Home | About | Journals | Submit | Contact Us | Français|
Novel drug targets are required in order to design new defenses against antibiotic-resistant pathogens. Comparative genomics provides new opportunities for finding optimal targets among previously unexplored cellular functions, based on an understanding of related biological processes in bacterial pathogens and their hosts. We describe an integrated approach to identification and prioritization of broad-spectrum drug targets. Our strategy is based on genetic footprinting in Escherichia coli followed by metabolic context analysis of essential gene orthologs in various species. Genes required for viability of E. coli in rich medium were identified on a whole-genome scale using the genetic footprinting technique. Potential target pathways were deduced from these data and compared with a panel of representative bacterial pathogens by using metabolic reconstructions from genomic data. Conserved and indispensable functions revealed by this analysis potentially represent broad-spectrum antibacterial targets. Further target prioritization involves comparison of the corresponding pathways and individual functions between pathogens and the human host. The most promising targets are validated by direct knockouts in model pathogens. The efficacy of this approach is illustrated using examples from metabolism of adenylate cofactors NAD(P), coenzyme A, and flavin adenine dinucleotide. Several drug targets within these pathways, including three distantly related adenylyltransferases (orthologs of the E. coli genes nadD, coaD, and ribF), are discussed in detail.
The growing number of antibiotic-resistant microbial pathogens (44, 46; http://www.cdc.gov/narms/default.htm) presents a serious challenge to modern medicine. The majority of existing antibiotics utilize a limited number of core chemical structures and target only a few cellular functions, such as cell wall biosynthesis, DNA replication, transcription, and translation (67). Identification of unexplored cellular functions as potential targets is a prerequisite for development of novel antibiotic chemotypes. Choosing an optimal target function is a crucial step in the long and expensive process of drug development and requires the best possible understanding of related biological processes in bacterial pathogens and their hosts.
Extensive programs utilizing genomic information to search for novel antimicrobial targets have been launched recently in industry and academia (14, 15, 69, 72, 91). Complete genome sequences of multiple bacterial species, including many major pathogens, have become available in the last few years, with many more such projects under way (16). The abundance of genomic data has enabled the development of novel postgenomic experimental and computational techniques aimed at drug target discovery. Experimental approaches to genome-wide identification of genes essential for cell viability in several microbial species have been reviewed recently (23, 43, 54, 59, 67, 77, 79). These techniques are based on either systematic gene inactivation by directed mutagenesis on a whole-genome scale or high-throughput random transposon mutagenesis. The major advantage of the latter technique is the parallel analysis of thousands of genes under multiple growth conditions.
A transposon-based approach termed “genetic footprinting” was originally described for Saccharomyces cerevisiae (84, 85). Genetic footprinting is a three-step process: (i) random transposon mutagenesis of a large number of cells, (ii) competitive outgrowth of the mutagenized population under various selective conditions, and (iii) analysis of individual mutants surviving in the population using direct sequencing or various hybridization and PCR-based techniques. Various modifications of genetic footprinting have been recently applied to several microorganisms, including Mycoplasma genitalium and Mycoplasma pneumoniae (49), Pseudomonas aeruginosa (99), Helicobacter pylori (52), and Escherichia coli (7, 45). Another version of this method, termed genomic analysis and mapping by in vitro transposition, has been developed for Haemophilus influenzae (1, 75) and Streptococcus pneumoniae (1). Direct application of this technique is usually limited to microbial species with natural competence, since transposon mutagenesis is performed in vitro on isolated DNA fragments, and mutations are introduced into the genome by transformation with linear DNA fragments followed by gene conversion. By targeting a specific genomic region, this approach increases insertion density, improving resolution of genetic footprinting. An elegant extension of this method beyond naturally competent species was described for P. aeruginosa (99).
Genetic footprinting in E. coli by mini-Tn10 mutagenesis in vivo has been recently reported (7, 45). In our systematic analysis of E. coli gene essentiality, we have chosen an approach based on the use of the “transposome,” precut transposon DNA in a stable complex with modified Tn5 transposase (40, 47). This approach has several advantages over mini-Tn10 mutagenesis: (i) only single irreversible insertions are produced, since the only source of transposase activity is the protein within a transposome complex; (ii) there is no need to assemble an elaborate transposon delivery vector system with tight regulation of replication and transposase expression; and (iii) a limit of one insertion per cell can be achieved based on the ratio of transposome complexes to competent cells at the time of transformation.
Computational techniques for the identification of potential drug targets based on genomic data have been reviewed recently (35, 37, 67, 79). In a few studies, in silico analysis was successfully combined with experimental techniques (4, 23, 36). For example, Arigoni and coauthors used comparative genome analysis to identify previously uncharacterized genes as potential broad-spectrum targets by emphasizing genes which are (i) broadly conserved in various bacteria, including pathogens; (ii) not conserved in humans; and (iii) likely to encode soluble proteins. The essentiality of selected genes was further assessed by directed knockouts in E. coli and Bacillus subtilis (4). Most in silico target identification techniques are focused on formal comparative sequence analysis, without attempting to assess conservation of the overall biological context in various pathogens and the human host. We believe that comparative analysis of pathways and biological subsystems may significantly improve our ability to select potential targets.
Here we describe an approach to the identification and prioritization of broad-spectrum antimicrobial targets with known functional roles. We use a list of essential genes inferred by genetic footprinting in a model microorganism (E. coli) as a starting point for analysis of the corresponding functional roles in the context of metabolic pathways in microbial pathogens. This kind of comparative analysis based on metabolic reconstruction from genomic data indicates suitable target pathways and target functions within these pathways, as well as the potential range of pathogens for each target. Comparison of microbial pathways and individual target proteins with their human counterparts allows further target prioritization. Finally, prospective targets are validated by directed knockouts in model pathogens (Staphylococcus aureus or H. influenzae). The corresponding proteins are cloned from the representative pathogens and human cDNA libraries, expressed and analyzed in detail. The most advanced target proteins from representative bacterial sources, as well as their human counterparts, are crystallized, and their structures are determined to assist in further drug development.
We illustrate this approach using examples from the biosynthesis of three adenylate cofactors, NAD, coenzyme A (CoA), and flavin adenine dinucleotide (FAD), presenting several promising targets for the development of novel broad-spectrum antibiotics.
E. coli strains MG1655 (F− lambda− ilvG rfb50 rph1) (53) and DH10B [F′ mcrA Δ(mrr-hsdRMS-msrBC) 80lacZΔM15 ΔlacX74 endA1 recA1 deoR Δ(ara-leu)7697 araD139 galU galK nupG rpsL] (42) and S. aureus strain RN4220 (55) were used in this work.
Plasmid pMOD<MCS> containing artificial transposon EZ::TN<KAN-2> (Epicentre Technologies, Madison, Wis.) was isolated from the same strain (MG1655 or DH10B) that was subsequently mutagenized to avoid restriction-modification problems. Transposon DNA was released by PvuII digestion, as recommended by the manufacturer, and gel purified using QIAquick gel extraction columns (Qiagen, Valencia, Calif.). Transposomes were preformed by incubating transposon DNA (7 ng/μl) with hyperactive Tn5 EZ::TN transposase (0.1 U/μl; Epicentre Technologies and generous gift from W. Reznikoff) in a solution containing 40 mM Tris-acetate (pH 7.5), 100 mM potassium glutamate, 0.1 mM EDTA, 1 mM dithiothreitol, and tRNA (0.1 mg/ml). Samples were incubated for 30 min at 37°C and dialyzed against 10 mM Tris-acetate, pH 7.5, plus 1 mM EDTA on 0.05-μm filters (Millipore, Bedford, Mass.) for 1 h. Dialyzed samples were mixed with electrocompetent E. coli in a 1:2 ratio (vol/vol) and transformed by electroporation. Cultures were immediately diluted with a Luria-Bertani-based rich medium (see below) without kanamycin and incubated at 37°C for 40 min with gentle agitation. The efficiency of electroporation for E. coli strains MG1655 and DH10B was 5 × 104 and 2 × 106 kanamycin-resistant colonies per 1 μg of transposon DNA, respectively.
Half of the mutagenized population was immediately frozen and stored as the time zero sample. The rest of the culture was used to inoculate a BIOFLO 2000 fermentor (New Brunswick Scientific, Edison, N.J.) containing 950 ml of the following medium: tryptone, 10 g/liter; yeast extract, 5 g/liter; 50 mM NaCl, 9.5 mM NH4Cl, 0.528 mM MgCl2, 0.276 mM K2SO4, 0.01 mM FeSO4, 5 × 10−4 mM CaCl2, and 1.32 mM K2HPO4. This medium was supplemented with the following micronutrients: 3 × 10−6 mM (NH4)6(MoO7)24, 4 × 10−4 mM H3BO3, 3 × 10−5 mM CoCl2, 10−5 mM CuSO4, 8 × 10−5 mM MnCl2, and 10−5 mM ZnSO4 (68). The following vitamins were added (concentrations are in milligrams per liter): biotin, 0.12; riboflavin, 0.8; pantothenic acid, 10.8; niacinamide, 12.0; pyridoxine, 2.8; thiamine, 4.0; lipoic acid, 2.0; folic acid, 0.08; and p-aminobenzoic acid, 1.37. Kanamycin was added to 10 μg/ml. Throughout the fermentation temperature was held at 37°C, dissolved oxygen was held at 30 to 50% of saturation, and the pH was held at 6.95 (via titration with 5% H3PO4). Media and growth conditions were designed to minimize the number of genes required for cell survival. Cells were grown in batch culture for 23 population doublings (12 h) to a cell density of 1.4 × 109. Genomic DNA was isolated and used to generate genetic footprints.
Two pairs of primers were used consecutively, with the second pair of primers nested within the first (see Fig. Fig.2A).2A). Each primer pair contained one transposon-specific primer and one chromosome-specific primer. Chromosome-specific landmark primers were designed as an ordered set of unidirectional primer pairs covering the entire E. coli genome by using custom software. Pairs were separated on average by 3,500 bp, while primers within each pair were separated by the shortest possible distance in the range −3 to 900 bp. Average primer length was 27 bp (sequences available upon request). Transposon-specific primers were chosen to avoid any significant similarity with the E. coli chromosome, using PrimerSelect software (DNASTAR, Inc., Madison, Wis.). Two pairs of nested, outwardly directed transposon-specific primers (one at each end) were used to detect transposons inserted in both orientations (see Fig. Fig.2A).2A). The forward primer pair includes an external primer, 5′-GTTCCGTGGCAAAGCAAAAGTTCAA-3′, and an internal primer, 5′-GGTCCACCTACAACAAAGCTCTCATCA-3′. The reverse primer pair includes an external primer, 5′-CCGACATTATCGCGAGCCCATTTAT-3′, and an internal primer, 5′-GCAAGACGTTTCCCGTTGAATATGGC-3′.
The first of two consecutive PCR amplifications (the external PCR) was performed under the following conditions: 95°C for 1 min; 94°C for 12 s, 70°C for 6 min (2 cycles); 94°C for 12 s, 69°C for 6 min (2 cycles); 94°C for 12 s, 68°C for 6 min (36 cycles); 68°C for 6 min. Amplification reactions contained 0.3 μg of template DNA (equivalent of 6 × 107 E. coli genomes), a 0.2 mM concentration of each deoxynucleoside triphosphate, a 0.4 μM concentration of each primer, PCR buffer (40 mM Tricine-KOH [pH 9.2], 15 mM potassium acetate, 3.5 mM magnesium acetate, bovine serum albumin [3.75 μg/ml]), and 0.4 μl of Advantage cDNA Polymerase Mix (Clontech Laboratories, Palo Alto, Calif.) in 20 μl. The second internal PCR was performed in the same reaction mix, except the DNA templates consisted of the products of the first PCR diluted 103-fold. Amplification conditions for internal PCR were as follows: 95°C for 1 min; 94°C for 12 s, 69°C for 6 min (2 cycles); 94°C for 12 s, 68°C for 6 min (9 cycles); 68°C for 6 min. The products of the internal PCRs (3-μl aliquots) were size separated on 0.65% agarose gels. All insert detection and analysis procedures were performed in a 96-well format.
Image capture and analysis were performed with 1D Image Analysis Software (Eastman Kodak Company, Rochester, N.Y.). Mapping of the detected Tn5 insertions was done using custom software, which calculates insert positions within a genome sequence using the addresses of the internal landmark primers and the size of the corresponding PCR products. Visualization of insert locations was done using custom software integrated into the ERGO database. The microbial genome database WIT, a predecessor of ERGO, has been described previously (71).
Reaction conditions were optimized using a small set of control genes with known essentiality, in order to detect the maximum number of inserts, while keeping the level of PCR-introduced noise very low. First, the minimum amount of genomic DNA that contained the representative mix of all mutant chromosomes and consistently yielded reproducible patterns of bands in a PCR was determined. Second, the products of external and corresponding internal PCRs were analyzed side by side on an agarose gel. This comparison was used as a guide for optimizing the PCR parameters: cooling rate, number of cycles in external versus internal PCR, and the rate of PCR products' dilutions. PCR conditions were fine-tuned to yield the minimum number of false products, namely, internal PCR bands lacking the corresponding external PCR product. Third, genetic footprints of a 4-kb nadD locus were generated independently using identical template DNA samples and four different chromosome-specific nested primer pairs. All four footprints yielded coherent patterns, consistently visualizing 10 transposon inserts in the area (data not shown). Finally, 12 random internal PCR bands were gel purified and sequenced. All the bands originated from the expected chromosomal loci. Comparison of the calculated insert locations with the sequencing results indicated a mapping error introduced by gel electrophoresis of about 4.5% of the size of each band.
The versions of genomes used in this work are as follows: E. coli K-12 MG1655 (18) (GenBank accession no. U00096), P. aeruginosa PAO1 (88) (GenBank accession no. AE004091), Bacillus anthracis Ames (http://www.tigr.org/tdb/mdb/mdbinprogress.html), Mycobacterium tuberculosis H37Rv (24) (GenBank accession no. AL123456), H. pylori J99 (3) (GenBank accession no. AE001439), S. aureus COL (http://www.tigr.org/tdb/mdb/mdbinprogress.html/), S. pneumoniae (GenBank accession no. AE005672), M. genitalium (34) (GenBank accession no. L43967), H. influenzae Rd (32) (GenBank accession no. L42023), Chlamydia trachomatis (87) (GenBank accession no. AE001273), and Homo sapiens (http://www.ncbi.nlm.nih.gov/genome/guide/human/).
Comparisons of the specific biochemical capabilities of E. coli, relevant bacterial pathogens, and the human host were based on metabolic reconstructions performed in the ERGO database. Metabolic reconstructions in ERGO are tentative projections of all known biochemical pathways onto a specific organism with a completely sequenced and annotated genome (described in reference 82). These projections are based on the presence or absence of orthologs of specific genes known, on the basis of studies with other organisms, to be involved in corresponding pathways. This approach produces an estimate of the metabolic potential of a given organism (many asserted pathways may or may not be actually expressed under specific conditions) but without direct experimental data it falls short of defining actual metabolic fluxes.
Examples illustrating this type of analysis are given in Tables Tables22 to to44 for three metabolic subsystems (groups of pathways) related to biosynthesis of the adenylate cofactors NAD(P), CoA, and FAD. We define a pathway as a series of consecutive transformations without bifurcations (82). Merging and/or alternative pathways (numbered I and II, etc., in Table Table2)2) are combined in higher hierarchical blocks, which ultimately form a subsystem. In contrast to pathways, subsystems do not have formally defined boundaries, and they are assembled based on conventions in the field (such as cofactor biosynthesis described in references 5, 12, and 13), as well as on biological expertise and the interests of a particular user.
The comparisons between human and bacterial sequences were done using the HMMER 2.2 program (30; http://hmmer.wustl.edu/), with multiple-sequence alignments obtained from CLUSTAL-W (90). We first built a profile hidden Markov model (HMM) for each target enzyme, using the ortholog sequences from the pathogens of interest, as specified in Fig. Fig.5A.5A. HMMs for the following enzymes were built using the full-length protein sequences of nicotinic acid mononucleotide adenylyltransferase (NaMNAT), NAD synthetase (NADS), NAD kinase (NADK), pantothenate kinase (PK), phosphopantetheine adenylyltransferase (PPAT), and dephospho-CoA kinase (DPCK). Protein domains roughly corresponding to amino acid residues 1 to 212 and 213 to 413 of the E. coli CoaBC protein were used for the phosphopantothenoylcysteine decarboxylase (PPCDC) and phosphopantothenoylcysteine synthetase (PPCS) HMMs, respectively. Likewise, protein domains approximately corresponding to amino acid residues 1 to 176 and 177 to 214 of the E. coli RibF protein were used to build profile HMMs of FAD synthase (FADS) and flavokinase (FK), respectively. Obtained HMMs were then run against each bacterial and human sequence, using a database size of 106 and a high-cutoff E value of 104. The resulting E values give an indication of how closely the bacterial sequences are related and how remote the human sequence is from the conserved core of the bacterial group (the higher the E value is, the higher the divergence is).
For direct verification of gene essentiality in E. coli we used pKO3-based allelic exchange (58). This is briefly illustrated here using nadD as an example. Deletion of this gene (predicted to be essential) was attempted in parallel with nadR (a nonessential gene from the same metabolic subsystem) as a control. The E. coli nadD (12, 63) (formerly ybeN) and nadR genes with their flanking regions were PCR amplified using MG1655 genomic DNA as a template and the following primers: 5′-CAGCTGATTCGTAAGCTGCCAAGCATC-3′ and 5′-GGGGTCGACTCACTCACGGTGATAAGGATGGTTGGTGGTGATG-3′ for nadD and 5′-TGGCCTGCCGACTGACAATCTC-3′ and 5′-GCTGGAAAACGCCCTGCTGGAGT-3′ for nadR. Both PCR products were cloned into the gene replacement vector pKO3 (58). Next, the 650-bp HincII-ClaI fragment of nadR was replaced with the 1.7-kb Eco57I DNA fragment containing a tetracycline resistance cassette of pACYC184 (GenBank accession no. X06403). Similarly, the 434-bp ApaLIBglII fragment of nadD was excised and replaced with the kanamycin resistance cassette from pUC4K (GenBank accession no. X06404). E. coli strain MG1655 was transformed with these deletion plasmids. Following cointegration, resolution, and elimination of the plasmids, Kmr Sucr colonies were screened for sensitivity to chloramphenicol (58). Final verification of gene replacement was done by PCR analysis with primers flanking the region of recombination.
In the case of nadR, gene replacement was readily achieved: 98% of Kmr Sucr colonies were also Cms, and all of the Kmr Sucr Cms colonies tested by PCR contained the expected deletion in the chromosome. In the case of nadD, no Kmr Sucr Cms colonies were recovered (out of 192 Kmr Sucr colonies screened), suggesting that nadD is essential for E. coli growth in rich medium.
Allelic exchange vector pBT2 was utilized for verification of gene essentiality in S. aureus (20). The nadD (gi|15927174) and pncA (gi|15927492) S. aureus orthologs were PCR amplified using primers 5′-ATTCCTTGTCGCCCGTTATGC-3′ and 5′-AACGCGCTTCATTGTATCCT-3′ for nadD and 5′-GTCCGTTAATCCCACAAGCATCA-3′ and 5′-CGCCGACTTTATCTTTTTCAGC-3′ for pncA. Genomic DNA isolated from S. aureus strain ATCC 29213 was used as a PCR template. Both PCR products were cloned into plasmid pCR2.1-TOPO (Invitrogen, Carlsbad, Calif.). A 384-bp NheI-AvrII fragment of nadD was replaced with a tetracycline resistance marker (generous gift from M. Smeltzer, University of Arkansas for Medical Sciences, Little Rock), and the resulting 4.0-kb DNA fragment containing the inactivated nadD was cloned into pBT2 (20). This plasmid was introduced into S. aureus RN4220 by electroporation. Following cointegration, resolution, and elimination of the plasmid (20), tetracycline-resistant (Tetr) colonies were screened for sensitivity to chloramphenicol. No Tetr Cms colonies were found in the case of nadD, while pncA was successfully inactivated in the control experiment. In this case, 38% of the Tetr colonies were also Cms, and all of the Tetr Cms colonies tested by PCR contained the correct pncA deletion in the chromosome.
The S. aureus coaD ortholog (gi|15926709) with flanking regions was amplified by PCR using primers 5′-GATTGCCAGTTGTAGGGTTCATA-3′ and 5′-GCGTTGGCTTAATCACAGAATA-3′, cloned into pBT2 (to produce plasmid pMF32), and subjected to in vitro transposon mutagenesis. To this end an artificial transposon was constructed (M. Farrell et al., unpublished data) by cloning the Tet M marker derived from Tn916 (38) into pMOD-2<MCS> (Epicentre Technologies). Plasmid pMF32 was mixed in a 1:1 molar ratio with Tet M-transposon DNA, and EZ::TN transposase (Epicentre Technologies) was added to a 0.1-U/μl final concentration. The mixture was incubated for 2 h at 37°C in reaction buffer (50 mM Tris-acetate [pH 7.5], 150 mM potassium acetate, 10 mM magnesium acetate, and 40 mM spermidine) and used to transform E. coli strain DH10B. One hundred Tetr E. coli colonies were screened by whole-cell PCR, and five different transposon-containing pMF32 plasmids were selected, including two plasmids containing coaD inactivated by a single transposon insertion in the middle of the gene and three plasmids with a transposon in open reading frames (ORFs) immediately upstream or downstream of coaD. All five plasmids were used in parallel for pBT2-driven allelic exchange in S. aureus RN4220 (see above). No Tetr Cms (knockout) colonies were obtained for coaD disruptions, while the neighboring ORFs (encoding gi|15926708 and gi|15926710) were successfully inactivated.
Representative bacterial target enzymes and their human counterparts were expressed in E. coli BL21(DE3) as N-terminal fusions with a six-His tag using the pProEX-HT3a system (Invitrogen). Corresponding DNA fragments were amplified by PCR using bacterial genomic DNA (American Type Culture Collection, Manassas, Va.) or human brain cDNA (Clontech) and primers generating an NcoI or BspHI site at the 5′ end and a SalI or PstI site at the 3′ end. The DNA fragments were purified, digested, and cloned between NcoI and SalI (or PstI) restriction sites in the vector. The resulting constructs were verified by DNA sequencing. Recombinant proteins were purified to homogeneity using a two-step procedure consisting of chromatography on Ni-nitrilotriacetic acid agarose (Qiagen) and gel filtration (Superdex G-200) as described previously (28). Preliminary enzymatic characterizations were performed for representative bacterial and human enzymes involved in the biosynthesis of NAD (NMN and NaMNAT), CoA (PPAT), and FAD (FADS) using high-performance liquid chromatography analysis of the reaction products and/or coupled enzymatic assays, such as described in reference 27. Enzymatic reactions were performed in the presence of 0.1 mM ATP and a 0.1 mM concentration of the respective substrates: NMN (or NaMN), 4′-phosphopantetheine, or flavin mononucleotide (FMN). High-performance liquid chromatography analysis was performed using ion-pair chromatography under isocratic conditions: 100 mM sodium phosphate buffer (pH 3.5) with 8 mM tetrabutylammoniumbromide and methanol (8% for NaMNAT, 15% for PPAT, and 30% for FADS), on a column (50 by 4.6 mm [inner diameter]), packed with 5 μM C18 (Supelco).
Genetic footprinting (Fig. (Fig.1)1) was systematically applied to determine the genes required for logarithmic aerobic growth of E. coli MG1655 in enriched Luria-Bertani medium. Genetic footprinting of a limited chromosomal region was also performed with E. coli strain DH10B under identical experimental conditions.
Preliminary gene essentiality conclusions were made based on a semiautomatic analysis of the number and relative positions of inserts retained within each gene after selective outgrowth for 23 population doublings. Failure to recover inserts, or the presence of only a limited number of inserts at the very end of a coding sequence, suggested that cells carrying transposons in that gene were not viable under these growth conditions. However, this could also occur if a gene constituted a cold spot for transposition or if an insert had a polar effect on an essential downstream gene. We validated the technology with a combination of the following controls: (i) genetic footprinting of the mutagenized population prior to outgrowth (time zero sample) for a number of genes previously established to be essential (Fig. (Fig.2B),2B), (ii) verification of consistency of preliminary essentiality assignments within the metabolic and biological context of each gene (see below), (iii) comparison of the observed essentiality with data reported in the literature, and (iv) genetic footprinting in the presence of a complementing DNA fragment (for an example, see reference 27).
Libraries of 2 × 105 independent insertion mutants were generated in E. coli strains MG1655 and DH10B using Tn5-based EZ::TN<KAN-2>Tnp transposome (Epicentre Technologies) as a transposon delivery system. It utilizes a mutant hyperactive form of the Tn5 transposase, which has been reported to have low insertion site specificity (41, 64). In our study EZ::TN transposase was found to be sufficiently random to yield 1 to 14 independent insertions within a majority of E. coli genes (for example, see Fig. Fig.2).2). The average insertion density was experimentally determined to be one insert per 250 bp (after outgrowth). However, the total number of 105 independent insertion mutants analyzed after 23 cell divisions theoretically corresponds to one insert per 46 bp of genomic sequence (the E. coli genome size is 4,639 kb). We believe that this difference may be due to slight preferences in the target sequence recognition by the modified Tn5 transposase and can actually be used as a numerical measure of such a bias. In these data there is only a sixfold difference between the observed insertion density of one per 250 bp versus the frequency expected from completely random insertions (one per 46 bp). This insertion density allowed us to make essentiality assessments for 87% of E. coli ORFs, most of them larger than 80 amino acid residues in length.
To assess the reliability of our approach, we compared results from the first 50 min of the E. coli chromosome with gene essentiality data compiled from the literature by the Genetic Resource Committee of Japan (http://www.shigen.nig.ac.jp/ecoli/pec/Analyses.jsp). Of the 111 nonessential genes listed from this region of the chromosome, 98 genes (88%) were experimentally determined to be nonessential (retained inserts) by our procedure.
Of the 81 known essential genes located within the first 50 min of the E. coli chromosome, no transposon insertions were detected in 70 genes (86%) in our experiment. One gene out of the 81 analyzed, encoding essential cell division protein FtsK (11), contained 10 insertions. However, all of the inserts were clustered within the C-terminal half of the coding sequence, corresponding to amino acid residues 780 to 1329 (Fig. (Fig.2C).2C). Only the N-terminal domain of FtsK is required for its role in cell division and viability (97). This suggests that genetic footprinting can be used to successfully detect (and even map) in vivo essential domains within ORFs, in cases where the essential region is immediately adjacent to a translational start. In the 10 remaining essential genes a single transposon insertion was detected after outgrowth in at least one of the strains (MG1655 or DH10B). This may be due to the fact that these genes can tolerate transposon inserts within certain restricted loci without a detrimental effect on the corresponding gene product. Similar phenomena have been reported in other genetic footprinting experiments in both E. coli (45) and H. influenzae (1).
To test the reproducibility of our approach, we generated a limited genetic footprint in E. coli strain DH10B (covering the first 650 kb of the chromosome) and compared it with the corresponding region in MG1655. Unambiguous essentiality data in both strains were obtained for 530 genes (excluding those deleted in DH10B). Identical assessments of essentiality were produced for 487 genes (92%) of this group. The results of genetic footprinting experiments in MG1655 and DH10B for genes controlling biosynthesis of NAD, CoA, and FAD are presented in Table Table1.1. For the majority of these genes the two sets of data are in good agreement. Of 13 genes that lacked transposon insertions after outgrowth in MG1655, 12 did not contain any inserts in strain DH10B as well. The only exception was nadE, which contained a single insert in the DH10B experiment. Six out of seven nonessential genes in these pathways that have been analyzed in both strains contained inserts in both MG1655 and DH10B. Only panD lacked any inserts in DH10B, while in the MG1655 experiment it contained two. This may reflect the fact that in the DH10B experiment only transposons inserted in one of two possible orientations were monitored, while both orientations were mapped in the MG1655 experiment.
Comparison of the MG1655 and DH10B genetic footprints permitted detailed mapping of the two deletions in the DH10B chromosome: Δ(ara-leu)7697 (22) and ΔlacX74 (10, 89). Deletion Δ(ara-leu)7697 covers a 25.6-kb region, which corresponds to the area from 63.4 to 89.0 kb in the MG1655 genome and includes all the genes between polB and fruR (Fig. (Fig.2E).2E). Deletion ΔlacX74 corresponds to the region from 340.3 to 369.5 kb (total of 29.2 kb) in the MG1655 sequence and includes all the genes between b0324 and b0347 and possibly mhpB (not shown).
Surprisingly, in many cases transposon insertions were detected upstream of known essential genes, where they are expected to destroy promoter sequences or otherwise have a polar effect on expression of downstream genes. This is the case with coaD (as shown in Fig. Fig.2D),2D), as well as with argS, frr, rplT, secA, and many other genes. We believe that the EZ::TN<KAN-2> transposon sequence inserted in either orientation is capable of initiating a level of transcription sufficient for cell survival in many cases, even though no specific promoter sequence was added to its structure. Analysis of the transposon sequence by the Neural Network Promoter Prediction program (95; http://www.fruitfly.org/seq_tools/promoter.html) reveals multiple putative E. coli-type promoters oriented outwards in both directions. This probably explains why polar effects of transposon insertions on downstream genes were rarely observed in our study. However, translational polarity on distal domains within genes is likely to be a significant limiting factor of this technique. In a multifunctional protein with an essential C-terminal domain, inserts in nonessential N-terminal regions of this protein will not be tolerated since they interfere with translation of downstream sequences. This limits the subgenic resolution of genetic footprinting to cases of multifunctional proteins with essential domains proximal to a translational start (as shown in Fig. Fig.2C2C).
One of the major advantages of genetic footprinting over directed knockouts for identification of essential genes is the fact that footprinting is a fast and highly parallel method. As with any high-throughput technique, it has limitations, for example, due to cross-feeding during outgrowth in a complex mutant population. An efficient way to minimize erroneous conclusions with respect to the essentiality of a given gene is to consider the results in the context of the corresponding metabolic pathway or subsystem. Metabolic context analysis allows reconciliation of the data from genetic footprinting experiments, from the literature (if available), as well as predicted from metabolic reconstruction. This technique is most efficiently applied to E. coli, since it is one of the best-studied model microbial systems. When contradictions between these types of data are encountered for a certain gene, it is often possible to formulate the most probable assertion by inspecting the behavior of other E. coli genes within the same pathway.
The biosynthesis of adenylate cofactors provides a good example of the utility of this approach, since (i) these cofactors are essential metabolites in all types of organisms, (ii) the corresponding pathways in E. coli are rich in essential genes, and (iii) these subsystems have been thoroughly studied biochemically and genetically. On the other hand, some of the key genes were identified only recently, and their essentiality has not been directly confirmed. The results of genetic footprinting for the majority of the known E. coli genes involved with the biosynthesis and salvage pathways producing NAD(P), CoA, and FAD are summarized in Tables Tables22 to to44.
The behavior of the majority of NAD(P) biosynthetic genes in genetic footprinting experiments (Table (Table1)1) is consistent with previously published data and with the metabolic reconstruction of this system presented in Fig. Fig.3A.3A. As expected, all the genes of the de novo and salvage pathways appear nonessential, since in both experiments outgrowth occurred in rich media containing niacinamide. Nicotinamide riboside (NmR) is probably the most advanced NAD(P) precursor that can be transported by E. coli. Therefore, all the genes of the common pathway should be essential, since their inactivation cannot be compensated for by NAD(P) salvage. This is consistent with our observations, with the notable exception of nadD. Multiple insertions in this gene were observed, suggesting that nadD is dispensable for E. coli growth in rich media. This contradicted genetic data for Salmonella enterica serovar Typhimurium (48), as well as our own experiments in E. coli MG1655, which failed to produce a directed deletion of nadD in rich medium, even in the presence of 50 μM NAD or NMN. In our hands, nadD could only be successfully deleted in the presence of a plasmid containing a functional human nadD ortholog, pyridine nucleotide adenylyltransferase 1 (PNAT-1) (gi|12620200; O. Kurnasov, unpublished data).
One possible explanation of this contradiction is the presence of functional NadR in E. coli (Table (Table2).2). The nadR gene is a transcriptional regulator of the de novo biosynthesis and niacin salvage genes in E. coli (92). Recently NadR was shown to possess low levels of NMN adenylyltransferase activity (74) as well as NmR kinase activity (O. Kurnasov et al., unpublished data). This suggests that NadR may play a role in salvaging exogenous NMN. The NMN adenylyltransferase activity of NadR may also be involved in recycling intracellular NMN directly to NAD, bypassing the NAD synthase encoded by nadE (Fig. (Fig.3A).3A). However, flux through the PnuC-NadR salvage pathway is unlikely to be sufficient to compensate for inactivation of nadD or nadE, which are responsible for the bulk of NAD production in E. coli (73). Also, this explanation contradicts the apparent essentiality of nadE observed in our genetic footprinting experiments, since the compensatory role of the PnuC-NadR bypass would imply nonessentiality of nadE as well as nadD (Fig. (Fig.3A).3A). At present we cannot propose a single unambiguous interpretation of the contradiction between the classical genetic and genetic footprinting data with respect to nadD. However, this is the only point of disagreement in NAD(P) biosynthesis between the genetic footprinting data and other types of evidence.
In contrast to the complexity of NAD(P) metabolism, the CoA biosynthetic pathway is topologically simple (Fig. (Fig.3B).3B). All of the genetic footprinting data are consistent with other experimental data and with theoretical considerations (Table (Table1).1). Each enzymatic step of the common pathway from pantothenate to CoA is nonredundant and indispensable, since none of the phosphorylated intermediates can be transported into the cell. This eliminates potential cross-feeding and the possibility of exogenous CoA salvage or transport of any phosphorylated precursors from the growth medium. All the genes of the pantothenate biosynthetic pathway are nonessential and can be bypassed by exogenous pantothenate via a known transporter encoded by the panF gene (51, 93).
The coaA gene of E. coli, encoding pantothenate kinase, the first enzyme of the common five-step pathway from pantothenate to CoA, was previously characterized as essential (86, 94). Genes for the last four enzymatic steps of CoA biosynthesis (coaBC, coaD, and coaE) have only recently been discovered (39, 65). Two of these genes, coaE (formerly yacE) and coaD (kdtB), were previously shown to be essential (36, 45). As shown in Table Table1,1, all four genes were found to be essential in our footprinting study, as was acpS, which produces the enzyme responsible for covalent attachment of CoA to acyl carrier protein, which is required for fatty acid biosynthesis (33).
All of the FMN/FAD biosynthetic genes were found to be essential by genetic footprinting experiments in both E. coli strains (Table (Table1).1). The fact that the genes for de novo riboflavin biosynthesis were essential in the presence of riboflavin (0.8 mg/liter) in the medium is consistent with the absence of a riboflavin transporter in E. coli K-12 (6). Riboflavin auxotrophs can be obtained in E. coli using specific selection steps to facilitate riboflavin transport by uncharacterized mutations or on significantly higher concentrations of exogenous riboflavin (6, 8, 9).
Two enzymes of the common flavin pathway, consecutively producing the two cofactors, FMN and FAD, form a bifunctional protein encoded by the essential ribF gene. Both enzymatic activities are expected to be indispensable, although genetic footprinting data alone suggest only the essentiality of the C-terminal domain (encoding FK). This is a general limitation of the technique with respect to multifunctional proteins.
In summary, 13 genes encoding 16 enzymes within three vitamin/cofactor biosynthetic pathways were shown to be required for aerobic growth of E. coli in enriched media (Table (Table1).1). Twelve of these genes were identified as essential by genetic footprinting in agreement with published data and theoretical analysis. The only contradictory case (nadD) was reconciled by a directed gene deletion strategy. In addition, 12 more genes related to metabolism of NAD(P), CoA, and FAD were analyzed by genetic footprinting and found to be nonessential (Table (Table1).1). Again, these results are in good agreement with the available experimental data and metabolic reconstructions of the corresponding pathways.
Some essential genes representing potential broad-spectrum antibacterial targets were further analyzed with respect to the genomic data available for selected bacterial pathogens and for the human host (see below). This analysis allows extrapolation of the experimental gene essentiality data from E. coli onto several bacterial pathogens, including S. aureus.
The predicted essentiality of the S. aureus nadD ortholog (gi|15927174) was confirmed by an indirect approach. Deletion of nadD was attempted in parallel with the S. aureus pncA ortholog (gi|15927492), presumed to be dispensable in the presence of exogenous nicotinate, using allelic exchange vector pBT2 (20). No ΔnadD variants were obtained, even after extensive screening (2,248 colonies), while in the control experiment pncA was successfully inactivated in 38% of the colonies screened. This indicates that nadD is indeed essential for S. aureus growth in complex tryptic soy broth (Difco Laboratories, Detroit, Mich.).
A different strategy was used to verify the predicted essentiality of the S. aureus coaD ortholog (gi|15926709). In vitro transposon mutagenesis was used to disrupt coaD as well as two adjacent ORFs. The conclusion of the coaD essentiality was drawn based on the fact that coaD could not be disrupted while the neighboring ORFs (gi|15926708 and gi|15926710) were successfully inactivated by the Tet M transposon, with frequencies of 83 and 5%, respectively.
Selected bacterial targets and their human counterparts were overproduced in E. coli, purified, and characterized. The respective adenylyltransferase activities of the following recombinant enzymes were directly confirmed in vitro: NaMNATs from E. coli, S. aureus, and H. pylori (gi|1786858, gi|15924584, and gi|2314504, respectively); two isoforms of human PNAT, i.e., PNAT-1 (gi|12620200) and PNAT-2 (gi|11245478); PPATs from S. aureus, M. tuberculosis, H. influenzae, and H. pylori (gi|15924115, gi|560525, gi|1573650, and gi|15612433, respectively); the PPAT domain of the human PPAT/DPCK protein (27); FADS domain of the B. subtilis bifunctional FK/FADS protein (gi|16078730); and human FADS (F. Mseeh, unpublished results).
Our approach to identifying and ranking antibacterial drug targets based on a combination of genetic footprinting in a model system (E. coli) and comparative genome analysis is schematically illustrated in Fig. Fig.4.4. Both major components of this approach, experimental and computational, are postgenomic techniques. Genetic footprinting in E. coli allows identification of essential genes as a function of growth conditions in a high-throughput format. This relatively new technique has multiple potential applications to functional genomics. For example, by monitoring changes in the pattern of gene essentiality while varying growth conditions and/or genetic background, it may be possible to assess various aspects of cell metabolism. This can allow functions of previously uncharacterized genes to be established and allows the functional roles of known genes to be refined. An example of such a study has been published recently (7). Here we consider application of genetic footprinting to identifying novel antibacterial drug targets (Fig. (Fig.44).
Experimental gene essentiality data determined by genetic footprinting are analyzed in terms of the relevant E. coli metabolic subsystems and pathways using the ERGO database. Such a contextual analysis refines the essentiality assessment of each gene by addressing the following questions. Which metabolic pathway is this essential gene associated with? Does this pathway yield a metabolite essential for cell viability? Is there another route to produce the same metabolite? Can this metabolite be salvaged from the growth media in its final form? What is the most advanced precursor (the last point of salvage) that a cell can acquire from its environment? This kind of analysis generates hypotheses about the essentiality and nonessentiality of other known genes in the same pathways. Comparison of these models with actual genetic footprinting data and the data available from the literature (e.g., http://www.shigen.nig.ac.jp/ecoli/pec/Analyses.jsp) allows refinement of initial experimental assessments of gene essentiality or calls for further experimental examination (as in the case of nadD).
Once consistent data are compiled for E. coli, the metabolic reconstruction of selected pathways is extended to a panel of representative pathogens. A hierarchical overview illustrated in Tables Tables22 to to44 provides a convenient way to perform a comparative cross-genome analysis of relevant pathways and metabolic subsystems. Questions addressed at this step of the analysis include the following. Are these pathways present in all the pathogens under consideration? Which pathways containing essential genes are conserved and nonredundant within the group? Which functions within the selected pathways are encoded by closely related genes? The last point may reflect the likelihood of finding a small molecule that will efficiently bind to a target protein in most pathogens of the spectrum suppressing its functional activity. We use two parameters as preliminary measures of this likelihood, the P values of pairwise sequence comparisons and the E values produced by comparing each bacterial sequence to the profile (HMM) built for a specific panel of pathogens (as illustrated in Fig. 5A and B). With more structural and functional data, more elaborate and productive criteria may be applied, such as conservation of the elements of the active site, relative substrate preferences, or affinities for a range of natural and synthetic ligands.
Next, the corresponding human pathways are reconstructed from genomic data to assess potential side effects that may arise due to inhibition of the human counterparts of antibacterial targets. One criterion often used in selecting antibacterial targets is the absence of the corresponding function and even the corresponding pathway in the human host (for instance, bacterial cell wall biosynthesis). This approach, although historically productive, often unnecessarily rejects promising targets. Many targets in universal biosynthetic pathways can now be reconsidered because of the new opportunities provided by comparative genomics and parallel chemical synthesis. Certain features revealed at the level of host-pathogen comparative genome analysis may be useful for evaluation of such targets. For instance, a pathway may be unique in pathogens but potentially redundant in humans (as with NAD biosynthesis), or the human functional counterpart may be structurally unrelated to bacterial proteins. In cases where bacterial orthologs of a target enzyme and the human version share some sequence similarity, we prioritize targets based on (i) the lowest of the P values generated from pairwise comparisons of target protein homologs from each bacterial pathogen with their human functional counterpart and (ii) the E value derived from comparison of a human protein sequence with the profile HMM built for the corresponding pathogen proteins (as in Fig. Fig.5C5C).
To illustrate this approach, essential and conserved genes with known functions in the biosynthesis of the key adenylate cofactors NAD(P), CoA, and FAD were investigated. These biosynthetic pathways fit most of the criteria listed above: (i) each cofactor is utilized by multiple enzymes in all pathogens; (ii) most bacteria are unable to import these cofactors or their phosphorylated precursors without prior degradation; (iii) the last steps in each of the biosynthetic pathways are universal and nonredundant and contain a number of promising targets.
NAD(P) biosynthesis is characterized by a high level of complexity and diversity in various organisms (60, 66), including two versions of de novo biosynthesis and at least three routes for salvage of exogenous precursors (Table (Table22 and Fig. Fig.3A).3A). The absence of any recognizable genes for NAD(P) biosynthesis in the genome of C. trachomatis (Table (Table2)2) suggests that this obligate intracellular parasite has unique transport machinery for salvage of NAD and NADP. Therefore, C. trachomatis must be excluded from the panel of pathogens used to assess targets in NAD biosynthesis.
Genes controlling de novo NAD biosynthesis from aspartate and the salvage of niacin in E. coli are nonessential in rich media. This conclusion can be projected onto P. aeruginosa, B. anthracis, and possibly M. tuberculosis based on metabolic reconstructions (Table (Table2).2). The last of these cases is controversial. In spite of the fact that both genes required for niacinamide salvage (pncA and pncB) are present in the genome, M. tuberculosis has been reported to be strictly dependent on de novo NAD biosynthesis (83). Since pncA is expressed in M. tuberculosis (81, 102), one interpretation is that the pncB ortholog is functionally inactive in M. tuberculosis, at least under laboratory conditions.
Theoretically, bacterial de novo NAD biosynthesis from aspartate is an attractive target pathway, since it is absent in humans, where de novo biosynthesis occurs by a five-step oxidative degradation of tryptophan (66). However, within our set of bacterial pathogens, only H. pylori is expected to have useful targets in this pathway, since it lacks niacin salvage genes (Table (Table2).2). The two-step conversion of aspartate to quinolinate would make an excellent target pathway for anti-infective agents specific to H. pylori (Table (Table22 and Fig. Fig.3A).3A). A systematic analysis of essential genes conserved exclusively in H. pylori (23) did not recognize the potential of this pathway as a target for drug development, because it did not take into account the metabolic context of this organism.
The PnuC-NadR pathway of nicotinamide riboside salvage in H. influenzae provides another example of a narrow spectrum target. The complete pathway is present in only a limited number of bacteria, but it is the sole route for NAD biosynthesis in H. influenzae. H. influenzae requires so-called V factors for growth (25, 76), the simplest of which is nicotinamide riboside. All other acceptable V factors, NADP, NAD, and NMN (if present in the growth medium), are gradually degraded to nicotinamide riboside by membrane-bound or periplasmic phosphohydrolases, such as the nadN gene product (76). In addition, the two-step niacinamide salvage pathway in gram-positive pathogens such as S. pneumoniae and S. aureus may constitute a possible narrow-spectrum target. Both organisms lack all the genes for de novo NAD biosynthesis and for the PnuC-NadR pathway of nicotinamide riboside salvage (Table (Table22).
The three-step pathway from NaMN to NAD and NADP, requiring NaMNAT, NADS, and NADK, is conserved in the majority of bacterial pathogens (Table (Table2),2), representing potentially broad-spectrum targets. This pathway is nonredundant, and the corresponding genes are essential in E. coli (Table (Table2).2). NaMNAT and NADS are not present in C. trachomatis and H. influenzae, which excludes these pathogens from the target profile. Inhibition of NADK would cover the broadest range of pathogens, excluding only C. trachomatis (see Fig. Fig.5A).5A). However, NADK has been recently characterized in H. sapiens (57), revealing a very high sequence similarity with many bacterial enzymes (the P value of 2·10−25 for NADK from P. aeruginosa is significantly better than P values of pairwise comparison between some bacterial orthologs). The C-terminal synthase domain of the human NADS ortholog (gi|10433831) reveals a much lower but significant sequence similarity with some of its bacterial counterparts (the best P value is 6·10−13 for NADS from M. genitalium). Finally, the human counterpart of bacterial NaMNAT has only marginal sequence similarity with bacterial enzymes of the nadD family (the best P value is ~0.2 for NaMNAT from H. pylori). E values derived from comparing human PNAT, NADS, and NADK to profile HMMs of the corresponding bacterial protein families support these conclusions (Fig. (Fig.5C).5C). Prioritization of the three potentially broad-spectrum targets in NAD biosynthesis by a combination of these and other criteria summarized in Fig. Fig.55 yields a target preference order as follows: NaMNAT is better than NADS and much better than NADK.
NaMNAT activity was initially characterized in E. coli (26), and the chromosomal locus was mapped in Salmonella (48). However, the identity of the structural gene remained unknown until recently. We have identified this gene based on its clustering on the chromosome with nadE and pncB gene orthologs in some microbial genomes (12). This gene has been independently identified in E. coli by Mehl and coworkers (63). We have cloned and expressed nadD from E. coli and have cloned and expressed its orthologs from gram-positive (S. aureus) and gram-negative (H. pylori) pathogens (12), and we have purified the NadD proteins. Analysis of the substrate specificity of the purified recombinant proteins revealed strong preferences among the bacterial enzymes for NaMN over NMN (up to >17,000-fold in the case of S. aureus NaMNAT). This bias is consistent with the major flux in NAD biosynthesis of most bacteria going through nicotinic acid mononucleotide (NaMN) to NaAD.
At least three isoforms of NadD can be identified in the human genome (gi|12620200, gi|11245478, and gi|14029540). We have cloned, purified, and characterized the first two forms and designated them PNAT-1 and PNAT-2, since both of them perform the adenylyltransferase reaction equally efficiently with both pyridine nucleotides NMN and NaMN (O. Kurnasov et al., unpublished data). Human PNAT-2 has been independently identified and characterized by two other groups (31, 80). The observed dual specificity of human PNATs is consistent with previous experimental data (61) and with the reconstruction of the putative human NAD biosynthetic pathway from genomic data (Table (Table22).
The difference in substrate preferences between human PNATs and bacterial NaMNATs is additional incentive for pursuing NadD as a selective, yet relatively broad-spectrum antibacterial target. The three-dimensional structures of NaMNATs from E. coli (101) and B. subtilis (70) have been solved in association with NaMN. Comparison of these structures with the structure of human PNAT-2 (103) reveals significant differences in the organization of their active sites, which may facilitate identification of a selective antibacterial inhibitor.
Similar comparative analysis of the CoA and FMN/FAD biosynthetic pathways in humans and bacteria produced additional broad-spectrum antibacterial targets (Fig. (Fig.5).5). The de novo pathway for CoA biosynthesis is present in many bacterial pathogens and absent in humans (Table (Table3),3), but the corresponding enzymes do not seem to be attractive targets due to the universal presence of pantothenate symporter orthologs. Some pathogens that lack de novo pantothenate biosynthetic genes (Table (Table3)3) depend solely on salvage of pantothenate (S. pneumoniae and H. influenzae) or dephospho-CoA (M. genitalium and C. trachomatis). Bacterial pantothenate symporters (panF family) share high sequence similarity with a human protein implicated in vitamin transport (96). No panF orthologs are present in the genomes of M. genitalium or C. trachomatis. Specific transporters allowing salvage of host dephospho-CoA by these intracellular pathogens (which might represent narrow spectrum targets) are still unknown.
All five enzymatic steps in the common pathway of CoA biosynthesis from pantothenate are encoded by essential genes in E. coli (Table (Table3).3). Four of these enzymes are conserved in a broad range of bacterial pathogens. The first enzyme of the pathway, PK, encoded in E. coli by coaA (86) belongs to a structural family different from that of the recently identified eukaryotic PKs (21). This makes coaA-related PKs an attractive target for a relatively narrow subset of our list of pathogens, including H. influenzae, M. tuberculosis, and S. pneumoniae. S. aureus (as well as other staphylococci and enterococci) and B. anthracis lack coaA orthologs, and this enzymatic function is most likely performed by remote orthologs of the eukaryotic PK (such as tr|Q99SC8 in S. aureus). Neither of the two known forms of PK can be identified by sequence similarity analysis in P. aeruginosa, H. pylori, and a few other pathogens not included in our set, which suggests the existence of at least one uncharacterized form of this enzyme.
All the remaining enzymes in this pathway represent potential antibacterial drug targets of relatively broad spectrum (Table (Table3).3). In most bacteria, two enzymatic activities, PPCS and PPCDC, are located within the C-terminal and N-terminal domains of a bifunctional (fused) protein (gene coaBC, previously dfp in E. coli). The exception in our list of pathogens is S. pneumoniae (as well as other streptococci and enterococci) containing two separate ORFs, coaB and coaC, in one operon (in reversed order relative to the fusion proteins). Interestingly, B. anthracis (as well as Bacillus cereus) contains both a bifunctional PPCDC/PPCS and a monofunctional PPCS ortholog but no monofunctional PPCDC. Monofunctional PPCSs from S. pneumoniae and B. anthracis reveal very high sequence similarity to each other, but they are quite divergent from the PPCS domain of the bifunctional PPCDC/PPCS proteins. In contrast to other bacterial orthologs, they produce a low but reliable similarity score compared with the human monofunctional PPCS (3·10−23 between H. sapiens and B. anthracis). The last enzyme of the common pathway, DPCK, is ubiquitous in all pathogens in our set.
The last four enzymes in human CoA biosynthesis were recently identified and characterized (27). Two of them, PPCDC and DPCK, are relatively similar to their bacterial counterparts (Fig. (Fig.5C).5C). The human monofunctional PPCS is very distant from PPCS domains of bacterial bifunctional PPCDC/PPCS proteins, but it is more closely related to the bacterial monofunctional PPCDCs. The PPAT domain of the human bifunctional PPAT/DPCK protein reveals no significant sequence similarity with any bacterial counterpart, and it is likely to be quite dissimilar in the overall structure. PPAT probably represents the most attractive target in the common CoA biosynthetic pathway for a number of reasons: (i) it constitutes a target in a broad range of pathogens (Fig. (Fig.5A),5A), (ii) all the bacterial orthologs of this enzyme are closely related to the consensus (no outliers; Fig. Fig.5B),5B), and (iii) the human PPAT is very dissimilar from the bacterial HMM profile (Fig. (Fig.5C).5C). PPAT from E. coli (encoded by coaD, previously kdtB) was previously described, and its three-dimensional structure was solved (39, 50).
The majority of bacteria (with the exception of M. genitalium) considered here have a conserved de novo riboflavin biosynthetic pathway, which is not present in humans (Table (Table44 and Fig. Fig.3C).3C). Moreover, all of the genes in this de novo pathway are essential in E. coli (Table (Table4).4). These observations have brought attention to many of these gene products as potential antibacterial drug targets (17, 29, 98). However, direct experimental analysis of riboflavin transport and salvage in these pathogens is necessary in order to define how useful these targets may be. B. subtilis is known to effectively salvage exogenous riboflavin, bypassing the requirement for the de novo pathway (19). Orthologs of one of the riboflavin transporters, gene ypaA, recently identified in B. subtilis (56) are present in many gram-positive pathogens, including B. anthracis, S. aureus, and S. pneumoniae, although their functionality has not been directly confirmed. Additional structurally dissimilar riboflavin transporters are likely to exist in other pathogens. For example, riboflavin transport must occur in M. genitalium, since a de novo riboflavin biosynthesis pathway cannot be asserted in this organism (Table (Table4).4). However, no ypaA orthologs can be found in any of the available mycoplasma or ureaplasma genomes.
A common pathway for FMN/FAD biosynthesis is represented in all bacteria by a bifunctional enzyme that catalyzes two consecutive reactions. The first reaction is formation of FMN, catalyzed by the FK domain (C terminus). The second reaction is conversion of FMN to FAD by the FADS domain (N terminus). Both enzymes represent antibacterial targets with a very broad range (Fig. (Fig.5A).5A). The HMM E values in the group of pathogens as a whole are high (Fig. (Fig.5B),5B), indicating that these targets are rather divergent. However, for FADS this parameter can be improved by removing some outliers, such as H. pylori (often the source of the most divergent proteins within our sample group of pathogens) and M. genitalium (Fig. (Fig.5B5B).
In contrast to those in all known bacteria, eukaryotic FK and FADS identified in S. cerevisiae (78, 100) are monofunctional proteins. We have overexpressed, purified, and characterized both human enzymes of the common FMN/FAD biosynthetic pathway (F. Mseeh, unpublished data). Sequence comparison of human FK and FADS with the profile HMMs clearly indicates that FADS is potentially a more selective target (Fig. (Fig.5C).5C). In agreement with this, no reliable P values are produced in pairwise comparisons of human FADS with any of its bacterial counterparts, while the human FK has sequence similarity to some bacterial FK domains higher than the similarity between some of the bacterial FK domains (for example, the M. tuberculosis FK domain gives a P value of 10−9 with human FK and a P value of 4·10−6 with the H. pylori FK domain).
In conclusion, genetic footprinting in E. coli in combination with comparative genome analysis of reconstructed metabolic subsystems and pathways has allowed us to identify a number of antibacterial targets involved in the biosynthesis of the adenylate cofactors NAD(P), CoA, and FAD. Among the most attractive targets in all three pathways are the adenylyltransferases: NaMNAT, PPAT, and FADS (Fig. (Fig.6).6). Two of these targets (NaMNAT and PPAT) have been validated by directed-knockout strategy in S. aureus. Some orthologs of the target enzymes from representative pathogens, as well as their human counterparts, were overproduced, purified, and characterized by direct in vitro assays. Comparative analysis of adenylate cofactor biosynthesis provides an illustration of a general approach that can be extended to other functional subsystems to reveal novel antibacterial targets.
While the manuscript was being reviewed, a genome-wide analysis of essential genes in H. influenzae was published (2). Comparing essential genes in divergent organisms will be useful “for defining both the common essential pathways of life and potential targets for development of antimicrobial therapeutics” (2). Our genome-scale list of essential and nonessential genes in E. coli is currently in preparation.
We are grateful to William Reznikoff and Igor Goryshin (University of Wisconsin, Madison); Jerry Jendrisak and Les Hoffman (Epicentre Technologies); Tadhg Begley (Cornell University, Ithaca, N.Y.); Zoltan Oltvai (Northwestern University, Chicago, Ill.); and Henry Burd, Niels Larsen, Gregory Kogan, Yuri Kuniver, Yuri Grechkin, and Robert Haselkorn (Integrated Genomics) for valuable discussions, help, and support throughout this project. We thank George Church (Harvard Medical School, Boston, Mass.) for the gift of pKO3, Mark Smeltzer (University of Arkansas for Medical Sciences) for the gift of pCRII::tet and RN4220, Reinhold Brückner (Universität Tübingen, Tübingen, Germany) for pBT2, and Don Clewell (University of Michigan, Ann Arbor) for the plasmid pAM120 containing Tn916.