|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Folate synthesis and salvage pathways are relatively well known from classical biochemistry and genetics but they have not been subjected to comparative genomic analysis. The availability of genome sequences from hundreds of diverse bacteria, and from Arabidopsis thaliana, enabled such an analysis using the SEED database and its tools. This study reports the results of the analysis and integrates them with new and existing experimental data.
Based on sequence similarity and the clustering, fusion, and phylogenetic distribution of genes, several functional predictions emerged from this analysis. For bacteria, these included the existence of novel GTP cyclohydrolase I and folylpolyglutamate synthase gene families, and of a trifunctional p-aminobenzoate synthesis gene. For plants and bacteria, the predictions comprised the identities of a 'missing' folate synthesis gene (folQ) and of a folate transporter, and the absence from plants of a folate salvage enzyme. Genetic and biochemical tests bore out these predictions.
For bacteria, these results demonstrate that much can be learnt from comparative genomics, even for well-explored primary metabolic pathways. For plants, the findings particularly illustrate the potential for rapid functional assignment of unknown genes that have prokaryotic homologs, by analyzing which genes are associated with the latter. More generally, our data indicate how combined genomic analysis of both plants and prokaryotes can be more powerful than isolated examination of either group alone.
Folates are tripartite molecules comprising pterin, p-aminobenzoate (pABA), and glutamate moieties to which one-carbon units at various oxidation levels can be attached at the N5 and N10 positions (Figure (Figure1).1). In natural folates the pterin ring is in the dihydro or tetrahydro state, and a short, γ-linked polyglutamyl tail of up to about eight residues is usually attached to the first glutamate.
Tetrahydrofolates serve as cofactors in one-carbon transfer reactions during the synthesis of purines, formylmethionyl-tRNA, thymidylate, pantothenate, glycine, serine, and methionine  (Figure (Figure2).2). Most folate-dependent enzymes strongly prefer polyglutamates to monoglutamates, but the opposite is usually true of folate transporters so that polyglutamylation is generally considered to favor folate retention within cells and subcellular compartments [2,3].
Plants, fungi, certain protists, and most bacteria make folates de novo, starting from GTP and chorismate, but higher animals lack key enzymes of the synthetic pathway and so require dietary folate [4-7]. Folates are crucial to human nutrition and health , and antifolate drugs are widely used in cancer chemotherapy and as antimicrobials [3,7,8]. For these reasons, folate synthesis and salvage pathways have been extensively characterized in model organisms, and the folate synthesis pathway in both bacteria and plants has been engineered in order to boost the folate content of foods [9-11].
The de novo folate synthesis pathway has the same steps in bacteria and plants, and consists of a pterin branch and a pABA branch (Figure (Figure3,3, rose and blue color, respectively). The first enzyme of the pterin branch is GTP cyclohydrolase I (GCHY-I, EC 188.8.131.52), which catalyzes a complex reaction in which the five-membered imidazole ring of GTP is opened, C8 is expelled as formate, and a six-membered dihydropyrazine ring is formed using C1 and C2 of the ribose moiety of GTP . The resulting 7,8-dihydroneopterin triphosphate is then converted to the corresponding monophosphate by a specific pyrophosphatase [5,12]. Removal of the last phosphate is believed to be mediated by a non-specific phosphatase . Dihydroneopterin aldolase (DHNA, EC 184.108.40.206) then releases glycolaldehyde to produce 6-hydroxymethyl-7,8-dihydropterin, which is then pyrophosphorylated by hydroxymethyldihydropterin pyrophosphokinase (HPPK, EC 220.127.116.11). DHNA also interconverts 7,8-dihydroneopterin and 7,8-dihydromonapterin, and cleaves the latter to 6-hydroxymethyl-7,8-dihydropterin. A paralog of DHNA, FolX, interconverts the triphosphates of 7,8-dihydroneopterin and 7,8-dihydromonapterin, and also catalyzes the same reactions as DHNA at very slow rates .
In the pABA branch of the pathway, chorismate is aminated to aminodeoxychorismate (ADC) by ADC synthase (EC 18.104.22.168) using the amide group of glutamine as amino donor . ADC is then converted to pABA by ADC lyase (EC 22.214.171.124) .
6-Hydroxymethyl-7,8-dihydropterin pyrophosphate and pABA moieties are condensed by dihydropteroate synthase (DHPS, EC 126.96.36.199). The resulting dihydropteroate is glutamylated by dihydrofolate synthase (DHFS, EC 188.8.131.52) giving dihydrofolate (DHF), which is reduced by dihydrofolate reductase (DHFR, EC 184.108.40.206) to tetrahydrofolate (THF). Folylpolyglutamate synthase (FPGS, EC 220.127.116.11) then adds a γ-glutamyl tail. In Escherichia coli, it has been reported that there can also be α linkages in the distal part of the polyglutamyl tail .
Although the biosynthetic steps are the same in plants and bacteria, the plant pathway is split between three subcellular compartments, with pterin synthesis in the cytosol, pABA synthesis in chloroplasts, and the other steps in mitochondria (Figure (Figure4)4) . FPGS isoforms are present in all three of these compartments, as are folates themselves [15,16]. Folates – both poly- and monoglutamates – are also found in plant vacuoles . The highly compartmented nature of folate synthesis in plants implies the existence of pterin and folate transporters that are integral components of the pathway.
Folate-related salvage pathways are of three kinds. The first ('intact folate salvage') (Figure (Figure3,3, green color) enables utilization of supplied folic acid and DHF, and relies on a DHFR activity to reduce these oxidized folates to THF, and on an FPGS activity . DHFR activity is also required to recycle the DHF produced in the reaction catalyzed by thymidylate synthase (TS, EC 18.104.22.168). The second kind of salvage ('pterin salvage') (Figure (Figure3,3, yellow color), known in Leishmania and other trypanosomatid parasites, involves the reduction of fully oxidized pterins to the dihydro and tetrahydro levels by pteridine reductase 1 (PTR1, EC 22.214.171.124) . This enables oxidized pterins to be used (after reduction to dihydro forms) for folate synthesis, and (after reduction to tetrahydro forms) as cofactors for aromatic hydroxylases and other pterin-dependent enzymes. Finally, some bacteria, plants, and protists probably carry out a more radical kind of salvage, in which the pterin and pABA-glutamate fragments produced by folate breakdown are recycled for folate synthesis . This type of salvage has been little studied and will not be considered further in this article.
Genes for all the enzymes of folate synthesis have been identified in model organisms such as Escherichia coli, Saccharomyces cerevisiae, and Arabidopsis thaliana [4-6]. Likewise, the intact folate salvage pathway has been well characterized in mammals, the malaria parasite Plasmodium, and Lactobacillus casei [7,19,20], and pterin salvage in Leishmania . However, analysis of the distribution of known folate synthesis and salvage genes in hundreds of bacterial genomes using the SEED platform  reveals that much remains to be learnt about both synthesis and salvage.
The SEED is a freely available, open-source database that provides efficient ways to discover new genes or pathways, to generate predictions about gene function, and to improve annotations, based on a 'functional subsystem approach' . This approach has much in common with metabolic reconstruction [22,23]. A functional subsystem may be defined as a set of functional roles (usually ten to twenty) jointly involved in a biological process. A typical subsystem is a group of enzymes, transporters, and regulatory components that participate in a metabolic pathway such as folate synthesis or salvage. Subsystem analysis examines which components are actually present in a genome and which should be present but cannot be identified, and so provides a picture of what is actually missing. This sets the stage to pursue the 'missing genes', also termed 'pathway holes' [24-26]. Homology-based searches alone are usually unable to locate missing genes that have not been previously identified in any genome ('globally missing genes') or those that are missing due to non-orthologous gene replacement ('locally missing genes') .
In this study, we first predicted the pathways (de novo folate synthesis, intact folate salvage, and pterin salvage) present in around four hundred sequenced bacteria and identified cases of missing genes for almost every step of the synthesis pathway. Candidates for such missing genes in bacteria and plants were then predicted using comparative genomic tools and representative candidates were tested experimentally.
As folate-dependent formylation of the initiator tRNA is a hallmark of bacterial translation and bacteria cannot import formylmethionyl-tRNA , we investigated the distribution of the fmt gene encoding methionyl-tRNA formyltransferase (EC 126.96.36.199) as a signature gene for a folate requirement. Homologs of fmt are found in all sequenced genomes except Mycoplasma hyopneumoniae and Onion yellows phytoplasma OY-M (Table (Table1).1). We confirmed the observation  that M. hyopneumoniae lacks all the enzymes of folate-mediated one-carbon metabolism except for glycine hydroxymethyltransferase (GlyA), which has aldolase activities that do not require folate . Another widespread folate-dependent metabolic step is the conversion of dTMP to dUMP, catalyzed by thymidylate synthase (ThyA, EC 188.8.131.52). This step can also be performed by a folate- and flavin-dependent thymidylate synthase (ThyX) . As first observed by Myllykallio et al. , most bacteria have a thyA or a thyX homolog, some have both, and the few that have neither – such as M. hyopneumoniae or Ureaplasma parvum – contain the tdk gene encoding the thymidine (dT) salvage enzyme thymidine kinase. Our genomic analysis suggests that M. hyopneumoniae strains are the only sequenced bacteria that do not require folate for initiator tRNA formylation or thymidylate synthesis. The situation in the phytoplasma that lacks the fmt gene (Table (Table1)1) is different; it contains a thyA homolog like most Mycoplasma species and therefore presumably requires intact folates.
As just discussed, folate is most probably essential for all sequenced bacteria except M. hyopneumoniae. However, not all bacteria synthesize folate de novo but instead rely on an external supply [see Additional File 1, variant 001; see "Methods" for an explanation of the variant code]. To predict the absence of the de novo synthesis pathway, the HPPK (FolK) and DHPS (FolP) proteins were used as signature proteins (for reasons described below). Many bacteria lack homologs of both these genes (Table (Table1)1) and so almost certainly rely on reducing and glutamylating intact folates taken up from the environment. These are mainly host-associated bacteria such as Mycoplasma or Treponema or organisms that live in folate-rich environments such as Lactobacilli. Chloroplasts and vacuoles must likewise take up folates from the cytoplasm (Figure (Figure4),4), and there is also evidence for folate uptake by intact plant cells .
Systems that mediate folate uptake in auxotrophs such as Lactobacillus casei and L. salivarius have been partially biochemically characterized [33,34], but the corresponding genes remain unknown. Whatever they are, they are most likely unrelated to mammalian folate carriers (i.e., the reduced folate carrier, the folate receptor, the intestinal folate carrier, and the mitochondrial folate carrier) since these lack close homologs among bacteria and plants. However, cyanobacteria, which are folate prototrophs, have a protein with significant similarity to a folate carrier from Leishmania species (FT1), and the cyanobacterial protein has a close homolog in plants (52% amino acid identity), as well as several more distant relatives in plants and in alpha-, beta-, and gamma-proteobacteria. We showed first that the cyanobacterial protein (Synechocystis slr0642) conferred the ability to transport folates and folate analogs when expressed in E. coli, and then that its plant homolog (Arabidopsis At2g32040) did the same . We further showed that the Arabidopsis At2g32040 protein is located in the chloroplast envelope . The weak slr0642 homolog in some alpha-proteobacteria (Silicibacter, Roseobacter) clusters with the folate-dependent enzyme sarcosine dehydrogenase, suggesting that this protein may also be a folate transporter.
Thus, despite progress in identifying folate transporters in cyanobacteria and in the chloroplast envelope, there are as yet no candidates for the folate carriers in many folate-requiring bacterial taxa, or in plant mitochondrial, vacuolar, and plasma membranes. These still-missing genes are future prospects for discovery by comparative genomics methods .
As noted above, DHFR is essential in both de novo and salvage pathways. Most bacteria have a folA gene (DHFR0), but two other bacterial enzymes able to reduce DHF are now known: FolM (DHFR1) belonging to the short-chain dehydrogenase/reductase (SDR) family , and a flavin-dependent dihydropteroate reductase that is fused to dihydropteroate synthase (DHFR2). . The trypanosomatid enzyme PTR1 can also reduce DHF and folic acid . As folM occurs in E. coli and other bacteria that also have a folA gene, its normal function is most probably not folate reduction, as discussed in a later section. The annotation of DHFR0 family members is complicated by their similarity to pyrimidine dehydrogenase family members (Pfam01872), which are numerous in Actinomycetes like Streptomyces coelicolor. At this stage we named them all DHFR0 but further genetic or biochemical analysis is needed to check these assignments.
Analysis of the distribution of DHFR genes in bacterial genomes reinforced the conclusions  that many bacteria such as Prochlorococcus marinus lack any recognizable DHFR proteins, and that most of these organisms use ThyX and not ThyA. Even if a high capacity for DHF reduction is not needed in ThyX-dependent organisms , these do require some DHFR activity to complete the de novo or salvage pathways so the corresponding gene(s) have yet to be identified in these organisms  (see Additional File 1, variants 106, 116, 006).
FolC-like proteins can have FPGS activity alone  or both DHFS and FPGS activities , which complicates annotation. Although the bifunctional type has a unique dihydropteroate binding site , it overlaps the rest of the substrate binding site and we could not derive a motif to distinguish mono- and bifunctional enzymes. We therefore annotated them all as bifunctional. By analogy with the Lactococcus. lactis situation, we predict that organisms reliant on the salvage pathway (see Additional File 1, variants 001 and 011) will have a monofunctional FPGS. The folC gene is missing in the Mycoplasma species that contain an fmt, a thyA and a folA gene and must therefore rely on a salvage pathway (Table (Table1).1). This absence points to three possibilities for these species: (a) they import folate polyglutamates; (b) they have a novel type of FPGS gene; or (c) they import monoglutamyl folates and polyglutamylation is not needed. We favor the last hypothesis as there is evidence for monoglutamyl folate uptake in Mycoplasma mycoides . A similar situation must exist in bacteria such as Borrelia burgdorferi that lack all folate synthesis genes but contain THF-dependent enzymes such as Fmt (Table (Table11).
The majority of sequenced bacteria (250 out of 400) contain all genes of the pathway and are therefore predicted to be prototrophic for folate (see examples in Table Table22 and Additional File 1, variant 111). However, a substantial minority lack just one or a few genes of the pterin or pABA branches, and detailed analysis of these cases reveals several biologically significant points.
The first enzyme of this branch, GCHY-I, is encoded in E. coli by the folE gene. A recent analysis of the distribution of folE genes among bacterial genomes showed the folE gene to be locally missing in one-third of them . Another protein family, COG1469, was found to responsible for 7,8-dihydroneopterin triphosphate formation in these organisms. This protein was named GCHY-IB and the corresponding gene folE2  (Table (Table2).2). Further analysis revealed that a few bacteria such as Wolbachia, Chlamydia, and Chlamydophila species lack both folE and folE2 homologs whereas they contain the signature genes of the pathway folKP (see Table Table22 and additional File 1, variants 701, 702), suggesting that another family of GCHY-I enzymes has yet to be identified. For instance, at least certain Chlamydia species are known to synthesize folates de novo , but lack folE and folE2. A candidate for the missing GCYH-I enzyme was the Chlamydia trachomatis protein CT610 and its homologs, which cluster with the folABKP folate genes in Chlamydia and Wolbachia species (Figure (Figure5A).5A). The protein is homologous to the pyrroloquinoline quinone (PQQ) biosynthesis protein PqqC that catalyzes an overall eight-electron oxidation, leading to a pyrrole and pyridine ring, but their active sites are not conserved, consistent with a different enzymatic activity . The CT610 gene was cloned in pBAD24 but failed to complement the dT auxotrophy of the E. coli folE mutant. The strong linkage of CT610 homologs with folate genes certainly points to a function in folate metabolism as other de novo folate genes than folE are missing in chlamydiae such as folQ or pabAabc (Table (Table2),2), but further studies are needed to determine its functional role.
The second step of folate synthesis is the removal of pyrophosphate. Although an enzyme mediating this step had been demonstrated in E. coli , no gene was known from any organism. We identified a DHNTP pyrophosphatase (FolQ) candidate in L. lactis as part of the folKEPQC gene cluster (Table (Table2)2) . FolQ belongs to the Nudix (Nucleoside diphosphate X) hydrolase family . Biochemical and genetic tests confirmed DHNTP pyrophosphatase activity . Furthermore, the closest Arabidopsis homolog of L. lactis FolQ was also shown to have this activity .
Since the Nudix family is large and functionally heterogeneous it is not very amenable to projection of annotations just by homology. FolQ homologs with a high homology score occur in rather few bacteria, so that the DHNTP pyrophosphatase gene is still missing in most genomes, including E. coli. Other putative phosphohydrolases unrelated to FolQ, FolQ2 members of the HDIG superfamily, are found in some folate-related gene clusters (Figure (Figure5B),5B), such as CPE1020 in Clostridium perfringens; these genes are good candidates for alternatives to FolQ but again have a limited phylogenetic distribution leaving the problem open in most bacterial species (Table (Table22).
The third specific enzyme of the pathway, DHNA, is encoded in E. coli by the folB gene. This gene and its paralog folX  appear to be missing in many phylogenetically diverse bacteria such as Geobacter metallireducens. Genome and functional context analysis allows the prediction that the DHNA role is played by members of the transaldolase (EC 184.108.40.206) family (e.g. DVU1658 in Desulfovibrio vulgaris). Specifically, about half the bacteria that lack DHNA have a transaldolase encoding gene that clusters with folK genes in several organisms (Figure (Figure5C).5C). This prediction awaits experimental validation as this transaldolase family is broad and only some of its members might encode a DHNA aldolase. Some genomes such as Rickettsia felis lack both FolB and transaldolase homologs while containing all the other de novo enzymes (see Table Table22 and additional File 1, variant 401), again suggesting that another family of FolB enzymes has yet to be identified unless the pathway is on its way to elimination in these organisms specifically.
HPPK (FolK) and DHPS (FolP) are distinctive proteins found in all organisms that make folate de novo and so, as noted above, these were used as pathway signature genes. A few sporadic organisms apparently lack one of the two genes, but further analysis shows that this is usually because of a gene-calling problem (a homolog can be found using the tblastn algorithm) or because the corresponding genome is still incomplete. Some organisms, however, have two folP genes or two folK genes (Table (Table2).2). Are these functionally redundant or catalyzing different reactions? In most cases one paralog is clustered with folate genes and the other clusters with genes involved in different pathways (see Table Table22 and additional File 1). For instance, in the high-GC gram-positive group the second folP (folP2) clusters with cell wall synthesis genes. In Mycobacterium leprae the folP2 gene does not complement an E. coli folP mutant whereas the copy that clusters with the folate genes (folP1) does, suggesting that folP2 is involved in another pathway .
FolK is duplicated in many organisms. In most cases such as Shewanella denitrificans (Table (Table2),2), one copy is in a folate operon and the other in a pantothenate operon but there are several cases where both genes are close to other folate biosynthesis genes (see also Additional File 1). Only experimental testing will show whether both copies are active. It is of note that an internal duplication of FolK and fusion with FolB is found in Bifidobacterium longum.
The sequenced chlamydiae all lack homologs of folC (DHFS/FPGS) but have folPK homologs (see Table Table22 and additional File 1), making folC a locally missing gene in this group. Inspection revealed that a member of gene family COG1478 is clustered in chlamydiae with folate biosynthesis genes (Figure (Figure5A,5A, folC2). This COG1478 family contains the F420:γ-glutamyl ligase CofE of Archaea and Mycobacteria . CofE catalyzes the GTP-dependent successive addition of two γ-linked L-glutamates to the L-lactyl phosphodiester of 7,8-didemethyl-8-hydroxy-5-deazariboflavin (F420), a reaction analogous to that mediated by FolC. Chlamydiae almost certainly do not make F420 since they lack all the other known cof genes . We accordingly predicted that the CofE homolog in chlamydiae has FolC activity. A cofE homolog (CT611) was shown to complement the methionine and glycine requirements of the E. coli folC mutant SF4  indicating that CT611 can indeed functionally replace FolC (Figure (Figure6).6). The E. coli folC gene from the ASKA collection  was used as a positive control.
We adopted the nomenclature of Xie et al.  for the pABA branch genes. These genes are hard to annotate for several reasons. In the first place, they can be fused in various combinations. A fusion between the subunits of ADC synthase (PabAa and PabAb) is a common arrangement, as is fusion between PabAa and ADC lyase (PabAc). In one genome, Corynebacterium diphtheriae, our analysis indicated a triple fusion. The functions of this PabAa-PabAb-PabAc fusion gene (DIP1790) were tested experimentally. The gene was cloned into an expression vector and introduced into an E. coli pabAa pabAb mutant (strain BN1163), which cannot grow on minimal medium unless it expresses a recombinant enzyme with ADC synthase activity. A bifunctional PabAa-PabAb ADC synthase protein from Arabidopsis served as a positive control. Like the positive control, expression of the DIP1790 protein restored pABA prototrophy (Figure (Figure7).7). This result shows that the DIP1790 protein has ADC synthase activity but does not demonstrate ADC lyase activity because the BN1163 strain has endogenous ADC lyase (PabAc). Enzyme assays were therefore used to test DIP1790 for ADC lyase activity. BN1163 cultures harboring plasmids encoding DIP1790, Arabidopsis ADC synthase, and E. coli PabAc were grown and induced, and proteins were extracted. Extracts of cells expressing DIP1790 were incubated with chorismate and glutamine, without or with E. coli PabAc; pABA was formed in the absence of PabAc whereas, as expected, Arabidopsis PabAa-PabAb formed pABA only if PabAc was added. Reaction rates (nmol pABA h-1 mg-1 protein) were: DIP1790 – PabAc, 7.0; DIP170 + PabAc, 6.0; Arabidopsis ADCS – PabAc, <0.01; Arabidopsis ADCS + PabAc, 4.0. These data establish that DIP1790 has ADC lyase as well as ADC synthase activity.
Another difficulty in annotating the pabAabc genes is that most organisms contain paralogs of pabAa and pabAb (trpAa and trpAb, respectively) that participate in tryptophan biosynthesis , and in some cases the PabAb (amidotransferase) subunit is shared between the pABA and tryptophan pathways . Finally, PabAc belongs to the large branched-chain amino acid aminotransferase family (EC 220.127.116.11) and is hard to distinguish from these enzymes. These problems mean that the current SEED annotation of the pABA branch of folate synthesis should be taken as tentative. That said, analysis of the distribution of these genes reveals that most bacteria make pABA from chorismate. As expected, many intracellular bacteria lack all pabA genes. In cases where the organisms have the pterin branch but lack all enzymes of the pABA branch, annotation problems cannot be ruled out but an alternative pathway for the biosynthesis of pABA, starting for example with dehydroquinate instead of chorismate, could also be the answer .
The Leishmania pterin reductase PTR1 is a member of the SDR family, but has a highly characteristic motif TGX3RXG (in place of the TGX3GXG motif that is typical of this family) . This motif is shared with E. coli FolM and similar SDR family proteins in a variety of bacterial taxa. Several of the folM-like genes are clustered with genes of the pterin branch of folate synthesis (Figure (Figure8),8), suggesting a function in folate or pterin synthesis Since E. coli  and other bacteria  are known to contain tetrahydromonapterin or other tetrahydropterins that could serve as cofactors for pterin-dependent enzymes, we predict that folM-like genes are not primarily involved in folate synthesis but rather are pteridine reductases that, like PTR1, produce and/or reduce 7,8-dihydropterins. (Note that such reductases are distinct from 6,7-dihydropterin reductases [also termed quinonoid pteridine reductases], of which E. coli has two [58,59].)
Consistent with this prediction, the recombinant FolM protein catalyzes reduction of dihydrobiopterin to the tetrahydro form; unlike PTR1, however, it does not mediate reduction of fully oxidized biopterin to the dihydro form . Supporting the latter observation, we found that an E. coli GCHY-I mutant (which is unable to make pterins) can use the dihydro but not the oxidized forms of neopterin, monapterin, or 6-hydroxymethylpterin to support folate synthesis . Futhermore, expression of a typical folM-like gene (Xylella fastidiosa PD0677, Figure Figure5D,5D, Table Table2)2) from a plasmid did not enable this mutant to use oxidized pterins, indicating that – like FolM – the PD0677 gene product does not act on oxidized pterins (Figure (Figure8).8). In control experiments in which Leishmania PTR1 was expressed from a plasmid, the mutant was able to use oxidized pterins, confirming that it is oxidized pterin reduction (and not uptake) that is lacking in E. coli (Figure (Figure8)8) .
Searching the Arabidopsis genome revealed some 86 members of the SDR family, of which none had the TGX3RXG motif. This led to the prediction that Arabidopsis would be unable to salvage oxidized pterins, which was verified by showing that 6-hydroxymethylpterin was not reduced in vivo or in vitro, and was not incorporated into folates .
This analysis and integration study demonstrates that simple phylogenomic analysis of a biochemical pathway – even a well-known one – can unearth globally missing (e.g., folQ) or locally missing (e.g., folE2 or folC2) genes in bacteria and plants and reveals that many open questions remain (such as the missing folQ, folB, folE cases listed in Table Table2).2). It can also identify, or suggest functions for, additional genes related to the pathway (e.g., folM). Such analysis can thus lead to discovery of potential new drug or herbicide targets such as GCHY-IB, which occurs in many pathogenic bacteria but not in mammals, or the chloroplast folate carrier that is likewise absent from mammals.
It should be noted that content of the current SEED folate subsystem captures the present status of an ongoing annotation effort, that the content will be refined and improved as more bacterial and plant genomes are added, and that further predictions are expected to emerge. Finally, we emphasize that the predictions herein are offered with the hope that others will find them useful in their own research.
Analysis of the folate subsystem was performed in the SEED database . Results are available in the 'Folate biosynthesis sub-system' on the public SEED server at . The snapshot of this analysis on the SEED database is given in the additional file. Phylogenetic pattern searches were made on the NMPDR SEED server at  to find candidates for the missing folE and folC genes. We also used the Blast tools and resources at NCBI  and the comparative genomics platforms STRING  for additional gene clustering analysis tools.
Annotations for paralog families were made using physical clustering on the chromosome when possible or by building phylogenetic trees using the ClustalW tool  integrated in SEED or deriving specific protein motifs. Pseudogenes (i.e., those encoding clearly aberrant proteins) were ignored; these are not uncommon in the folate pathways of intracellular parasites undergoing genome reduction .
The 'variant code' is used in SEED to schematize the type of pathways found in a given organism . A three-digit code was used. Digit one describes the pterin branch of the pathway: 1 = complete, 0 = HPPK and DHPS missing, 4 = DHNA missing, 7 = GCHY-I missing. Digit two describes the pABA branch: 1 = two or three of the pabAabc genes present, 0 = all pabAabc genes missing or just one present. Digit three describes the salvage pathway: 1 = complete; 0 = FPGS and DHFR missing; 2 = FPGS missing; 6 = DHFR missing. Variant -1 represents genomes with no pathway genes but no need for them because no folate-dependent enzymes are present. Particular care was given to annotation of fused proteins, which are common in both branches of the pathway; SEED has annotation tools to deal with fusion proteins .
Bacteria were routinely grown at 37°C in LB medium (BD Diagnostic Systems), in minimal medium  supplemented with 0.2% (v/v) glycerol, or in M9 medium . Agar (BD Diagnostic Systems) concentration in plates was 15 g l-1. Transformations were by standard procedures [69,70]. Thymidine (dT, 300 μM), ampicillin (100 μg ml-1), kanamycin (50 or 100 μg ml-1), tetracycline (10 μg ml-1), isopropyl-β-D-thiogalactopyranoside (IPTG, 0.5 or 1 mM), methionine (100 μg ml-1), glycine (100 μg ml-1), pABA (0.5 μg ml-1) and L-arabinose (0.02–0.2%, w/v) were added as required. Strains Topo10 (Invitrogen), BL21-CodonPlus (DE3)-RIL (Stratagene), DH10B, or DH5α were used for cloning and expression. SF4 (F-strA recA folC srlC::Tn10) , BN1163 (pabA1, pab-B::Kan, rpsL704, ilvG-, rfb-50, rph-1) (B. Nichols, University of Chicago), and MG1655 (ΔfolE::KanR)  were used for complementation tests.
The Chlamydia trachomatis CT610 and CT611 genes were cloned in pBAD24  using the following primers: CT610, 5'-AATACCATGGTGGAGGTGTTTATGAA-3' and 5'-AATAAAGCTTTTAATAAGATTGATGACAACTAC-3'; CT611, 5'-AATACCATGGAAATAACTCCGATCAAAACAC-3' and 5'-AATAAAGCTTTCATTTCTTTTCTTGACTCCAC-3'. Genomic DNA from C. trachomatis, LGV-II, strain 434 was obtained from ABI (Maryland). PCR products were obtained and purified as described , then digested with NcoI/HindIII before ligation into plasmid pBAD24 digested with the same enzymes and transformation into Topo10 cells (Invitrogen). The respective plasmids named pBY149.9 (expressing CT610) and pBY143.1 (expressing CT611) were checked by sequencing.
The Corynebacterium diphtheriae DIP1790 gene was cloned into pGEM-T Easy (Promega), after amplification from genomic DNA (obtained from the American Type Culture Collection) using the primers 5'-GCGGCCGCCACAGGAAACAGCTATGGTTATGCAACGCGCGCA-3' and 5'-GAGCTCTCACACTTGGGCGATATTCT-3'. The SstI site in the gene was ablated by PCR using the internal primers 5'-TCATCACCGAaCtTGAAGGCA-3' and 5'-TTTGCCTTCAaGtTCGGTGATG-3' (changed nucleotides in lower case). The modified gene was ligated into pGEM-T Easy and verified by sequencing. It was then excised with NotI and SstI and ligated into pLOI707HE . This construct was used to transform E. coli BN1163. Complementation tests were made using minimal medium, appropriately supplemented as above.
The Xylella fastidiosa Temecula1 PD0677 amplicon preceded by a Shine-Dalgarno sequence and a stop codon in frame with LacZα was cloned between the EcoRI and KpnI sites of pBluescript SK-. The PCR template was genomic DNA from the American Type Culture Collection; primers were 5'-AGTCAGAATTCGTGAAGGAAACAGCTATGTCAGATCCCTCTAAAGTC-3' and 5'-AGTAGGTACCTCATGTCAGCGTGCGGCC-3'; amplification was with KOD HiFi polymerase. The deduced amino acid sequence differed from that published in having serine not isoleucine at position 57. The PTR1 construct was as described . The constructs were introduced into E. coli folE deletant cells . Transformants were grown on LB plates supplemented appropriately as above.
Protein extracts were prepared from IPTG-induced cultures as described  from strain BN1163 harboring three pLOI707HE constructs – DIP1790, Arabidopsis ADCS , or vector alone – and from BL21-CodonPlus (DE3)-RIL harboring E. coli PabC cloned in pJMG30 . Glutamine-dependent pABA synthesis activity was assayed as described . PabC extract (3.8 μg protein) was added when indicated. Assays were incubated for 1 h at 37°C, stopped with 20 μl of 75% (v/v) acetic acid, held on ice for 1 h, then stored at -80°C until analysis. pABA was estimated by HPLC with fluorescence detection .
BElY carried out the complementation studies on the Chlamydia cofE homolog. RDdelaG made the complementation and biochemical assays on the Corynebacterium pABA synthesis protein. AN carried out the studies on the Xylella folM homolog. VdeC-L and ADH conceived the study, carried out bioinformatic work, and drafted the manuscript. All authors read and approved the final manuscript.
Spreadsheet summarizing the distribution, clustering, and fusions of bacterial and plant folate synthesis genes. This table represents a snapshot for the record of the "BMCgenomics 2007" table that can be found in the Folate Biosynthesis sub-system on the public SEED website http://theseed.uchicago.edu/FIG/index.cgi. Clustering is shown by similar color backgrounds. Genome and protein IDs are from the SEED database. Abbreviations for the functional roles are given in the first page of the spreadsheet, gene distribution in all analyzed genomes in the second page. Note that the SEED table is the primary source to which the reader is directed; it is not static but develops with time as new genomes become available and predictions are tested and validated.
We thank B. Shane for the SF4 strain, B. Nichols for the BN1163 strain, H. Mori for the pfolCEc plasmid, and G. Basset for help with enzyme assays. We thank A. Osterman for insightful suggestions and help with figure design. This work was supported in part by National Institutes of Health grants R01 GM70641-01 (to V. de C.-L.) and R01 GM071382 (to A.D.H.), by the National Research Initiative of the USDA Cooperative State Research, Education, and Extension Service, grant number 2005-35318-15228 (to A.D.H.), and by an endowment from the C.V. Griffin, Sr. Foundation.