The obligate-heritable endosymbionts of insects possess some of the smallest known bacterial genomes. This is likely due to loss of genomic material during symbiosis. The mode and rate of this erosion may change over evolutionary time: faster in newly formed associations and slower in long-established ones. The endosymbionts of human and anthropoid primate lice present a unique opportunity to study genome erosion in newly established (or young) symbionts. This is because we have a detailed phylogenetic history of these endosymbionts with divergence dates for closely related species. This allows for genome evolution to be studied in detail and rates of change to be estimated in a phylogenetic framework. Here, we sequenced the genome of the chimpanzee louse endosymbiont (Candidatus Riesia pediculischaeffi) and compared it with the closely related genome of the human body louse endosymbiont. From this comparison, we found evidence for recent genome erosion leading to gene loss in these endosymbionts. Although gene loss was detected, it was not significantly greater than in older endosymbionts from aphids and ants. Additionally, we searched for genes associated with B-vitamin synthesis in the two louse endosymbiont genomes because these endosymbionts are believed to synthesize essential B vitamins absent in the louse’s diet. All of the expected genes were present, except those involved in thiamin synthesis. We failed to find genes encoding for proteins involved in the biosynthesis of thiamin or any complete exogenous means of salvaging thiamin, suggesting there is an undescribed mechanism for the salvage of thiamin. Finally, genes encoding for the pantothenate de novo biosynthesis pathway were located on a plasmid in both taxa along with a heat shock protein. Movement of these genes onto a plasmid may be functionally and evolutionarily significant, potentially increasing production and guarding against the deleterious effects of mutation. These data add to a growing resource of obligate endosymbiont genomes and to our understanding of the rate and mode of genome erosion in obligate animal-associated bacteria. Ultimately sequencing additional louse p-endosymbiont genomes will provide a model system for studying genome evolution in obligate host associated bacteria.
gene loss; genome erosion; primary-endosymbiont; gamma-proteobacteria; Pediculus
Methylation is a versatile reaction involved in the synthesis and modification of biologically active molecules, including RNAs. N6-methyl-threonylcarbamoyl adenosine (m6t6A) is a post-transcriptional modification found at position 37 of tRNAs from bacteria, insect, plants, and mammals. Here, we report that in Escherichia coli, yaeB (renamed as trmO) encodes a tRNA methyltransferase responsible for the N6-methyl group of m6t6A in tRNAThr specific for ACY codons. TrmO has a unique single-sheeted β-barrel structure and does not belong to any known classes of methyltransferases. Recombinant TrmO employs S-adenosyl-L-methionine (AdoMet) as a methyl donor to methylate t6A to form m6t6A in tRNAThr. Therefore, TrmO/YaeB represents a novel category of AdoMet-dependent methyltransferase (Class VIII). In a ΔtrmO strain, m6t6A was converted to cyclic t6A (ct6A), suggesting that t6A is a common precursor for both m6t6A and ct6A. Furthermore, N6-methylation of t6A enhanced the attenuation activity of the thr operon, suggesting that TrmO ensures efficient decoding of ACY. We also identified a human homolog, TRMO, indicating that m6t6A plays a general role in fine-tuning of decoding in organisms from bacteria to mammals.
Folates are tripartite molecules comprising pterin, para-aminobenzoate (PABA), and glutamate moieties, which are essential cofactors involved in DNA and amino acid synthesis. The obligately intracellular Chlamydia species have lost several biosynthetic pathways for essential nutrients which they can obtain from their host but have retained the capacity to synthesize folate. In most bacteria, synthesis of the pterin moiety of folate requires the FolEQBK enzymes, while synthesis of the PABA moiety is carried out by the PabABC enzymes. Bioinformatic analyses reveal that while members of Chlamydia are missing the genes for FolE (GTP cyclohydrolase) and FolQ, which catalyze the initial steps in de novo synthesis of the pterin moiety, they have genes for the rest of the pterin pathway. We screened a chlamydial genomic library in deletion mutants of Escherichia coli to identify the “missing genes” and identified a novel enzyme, TrpFCtL2, which has broad substrate specificity. TrpFCtL2, in combination with GTP cyclohydrolase II (RibA), the first enzyme of riboflavin synthesis, provides a bypass of the first two canonical steps in folate synthesis catalyzed by FolE and FolQ. Notably, TrpFCtL2 retains the phosphoribosyl anthranilate isomerase activity of the original annotation. Additionally, we independently confirmed the recent discovery of a novel enzyme, CT610, which uses an unknown precursor to synthesize PABA and complements E. coli mutants with deletions of pabA, pabB, or pabC. Thus, Chlamydia species have evolved a variant folate synthesis pathway that employs a patchwork of promiscuous and adaptable enzymes recruited from other biosynthetic pathways.
Collectively, the involvement of TrpFCtL2 and CT610 in the tetrahydrofolate pathway completes our understanding of folate biosynthesis in Chlamydia. Moreover, the novel roles for TrpFCtL2 and CT610 in the tetrahydrofolate pathway are sophisticated examples of how enzyme evolution plays a vital role in the adaptation of obligately intracellular organisms to host-specific niches. Enzymes like TrpFCtL2 which possess an enzyme fold common to many other enzymes are highly versatile and possess the capacity to evolve to catalyze related reactions in two different metabolic pathways. The continued identification of unique enzymes such as these in bacterial pathogens is important for development of antimicrobial compounds, as drugs that inhibit such enzymes would likely not have any targets in the host or the host’s normal microbial flora.
The availability of thousands of sequenced genomes has revealed the diversity of biochemical solutions to similar chemical problems. Even for molecules at the heart of metabolism, such as cofactors, the pathway enzymes first discovered in model organisms like Escherichia coli or Saccharomyces cerevisiae are often not universally conserved. Tetrahydrofolate (THF) (or its close relative tetrahydromethanopterin) is a universal and essential C1-carrier that most microbes and plants synthesize de novo. The THF biosynthesis pathway and enzymes are, however, not universal and alternate solutions are found for most steps, making this pathway a challenge to annotate automatically in many genomes. Comparing THF pathway reconstructions and functional annotations of a chosen set of folate synthesis genes in specific prokaryotes revealed the strengths and weaknesses of different microbial annotation platforms. This analysis revealed that most current platforms fail in metabolic reconstruction of variant pathways. However, all the pieces are in place to quickly correct these deficiencies if the different databases were built on each other's strengths.
Non-orthologous displacements; Paralogs; Metabolic reconstruction
Mollicutes is a class of parasitic bacteria that have evolved from a common Firmicutes ancestor mostly by massive genome reduction. With genomes under 1 Mbp in size, most Mollicutes species retain the capacity to replicate and grow autonomously. The major goal of this work was to identify the minimal set of proteins that can sustain ribosome biogenesis and translation of the genetic code in these bacteria. Using the experimentally validated genes from the model bacteria Escherichia coli and Bacillus subtilis as input, genes encoding proteins of the core translation machinery were predicted in 39 distinct Mollicutes species, 33 of which are culturable. The set of 260 input genes encodes proteins involved in ribosome biogenesis, tRNA maturation and aminoacylation, as well as proteins cofactors required for mRNA translation and RNA decay. A core set of 104 of these proteins is found in all species analyzed. Genes encoding proteins involved in post-translational modifications of ribosomal proteins and translation cofactors, post-transcriptional modifications of t+rRNA, in ribosome assembly and RNA degradation are the most frequently lost. As expected, genes coding for aminoacyl-tRNA synthetases, ribosomal proteins and initiation, elongation and termination factors are the most persistent (i.e. conserved in a majority of genomes). Enzymes introducing nucleotides modifications in the anticodon loop of tRNA, in helix 44 of 16S rRNA and in helices 69 and 80 of 23S rRNA, all essential for decoding and facilitating peptidyl transfer, are maintained in all species. Reconstruction of genome evolution in Mollicutes revealed that, beside many gene losses, occasional gains by horizontal gene transfer also occurred. This analysis not only showed that slightly different solutions for preserving a functional, albeit minimal, protein synthetizing machinery have emerged in these successive rounds of reductive evolution but also has broad implications in guiding the reconstruction of a minimal cell by synthetic biology approaches.
In all cells, proteins are synthesized from the message encoded by mRNA using complex machineries involving many proteins and RNAs. In this process, named translation, the ribosome plays a central role. The elements involved in both ribosome biogenesis and its function are extremely conserved in all organisms from the simplest bacteria to mammalian cells. Most of the 260 known proteins involved in translation have been identified and studied in the bacteria Escherichia coli and Bacillus subtilis, two common cellular models in biology. However, comparative genomics has shown that the translation protein set can be much smaller. This is true for bacteria belonging to the class Mollicutes that are characterized by reduced genomes and hence considered as models for minimal cells. Using homology inference approach and expert analyses, we identified the translation apparatus proteins for 39 of these organisms. Although striking variations were found from one group of species to another, some Mollicutes species require half as many proteins as E. coli or B. subtilis. This analysis allowed us to determine a set of proteins necessary for translation in Mollicutes and define the translation apparatus that would be required in a cellular chassis mimicking a minimal bacterial cell.
Experimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.
The essential coenzyme NAD plays important roles in metabolic reactions and cell regulation in all organisms. As such, NAD synthesis has been investigated as a source for novel antibacterial targets. Cross-species genomics-based reconstructions of NAD metabolism in group A streptococci (GAS), combined with focused experimental testing in Streptococcus pyogenes, led to a better understanding of NAD metabolism in the pathogen. The predicted niacin auxotrophy was experimentally verified, as well as the essential role of the nicotinamidase PncA in the utilization of nicotinamide (Nm). PncA is dispensable in the presence of nicotinate (Na), ruling it out as a viable antibacterial target. The function of the “orphan” NadC enzyme, which is uniquely present in all GAS species despite the absence of other genes of NAD de novo synthesis, was elucidated. Indeed, the quinolinate (Qa) phosphoribosyltransferase activity of NadC from S. pyogenes allows the organism to sustain growth when Qa is present as a sole pyridine precursor. Finally, the redundancy of functional upstream salvage pathways in GAS species narrows the choice of potential drug targets to the two indispensable downstream enzymes of NAD synthesis, nicotinate adenylyltransferase (NadD family) and NAD synthetase (NadE family). Biochemical characterization of NadD confirmed its functional role in S. pyogenes, and its potential as an antibacterial target was supported by inhibition studies with previously identified class I inhibitors of the NadD enzyme family. One of these inhibitors efficiently inhibited S. pyogenes NadD (sp.NadD) in vitro (50% inhibitory concentration [IC50], 15 μM), exhibiting a noncompetitive mechanism with a Ki of 8 μM.
A comparative genomic analysis predicted that many members of the under-characterized COG0523 subfamily of putative P-loop GTPases function in metal metabolism. In this work we focused on the uncharacterized Escherichia coli protein YeiR by studying both the physiology of a yeiR mutant and the in vitro biochemical properties of YeiR expressed as a fusion with the maltose-binding protein (YeiR-MBP). Our results demonstrate that deletion of yeiR increases the sensitivity of E. coli to EDTA or cadmium and this phenotype in linked to zinc depletion. In vitro, the tagged protein binds several Zn2+ ions with nanomolar affinity and oligomerizes in the presence of metal. The GTPase activity of YeiR is similar to that measured for other members of the group, but GTP hydrolysis is enhanced by Zn2+ binding. These results support the predicted connection between the COG0523 P-loop GTPases and roles in metal homeostasis.
GTPase; zinc homeostasis; metal-binding; COG0523; cadmium
Archaeosine (G+) is found at position 15 of many archaeal tRNAs. In Euryarchaeota, the G+ precursor, 7-cyano-7-deazaguanine (preQ0), is inserted into tRNA by tRNA-guanine transglycosylase (arcTGT) before conversion into G+ by ARChaeosine Synthase (ArcS). However, many Crenarchaeota known to harbor G+ lack ArcS homologs. Using comparative genomics approaches, two families that could functionally replace ArcS in these organisms were identified: 1) GAT-QueC, a two-domain family with an N-terminal glutamine amidotransferase class-II domain fused to a domain homologous to QueC, the enzyme that produces preQ0; 2) QueF-like, a family homologous to the bacterial enzyme catalyzing the reduction of preQ0 to 7-aminomethyl-7-deazaguanine. Here we show that these two protein families are able to catalyze the formation of G+ in a heterologous system. Structure and sequence comparisons of crenarchaeal and euryarchaeal arcTGTs suggest the crenarchaeal enzymes have broader substrate specificity. These results led to a new model for the synthesis and salvage of G+ in Crenarchaeota.
The biosynthesis of GTP derived metabolites such as tetrahydrofolate (THF), biopterin (BH4), and the modified tRNA nucleosides queuosine (Q) and archaeosine (G+) relies on several enzymes of the Tunnel-fold superfamily. A subset of these proteins include the 6-pyruvoyl-tetrahydropterin (PTPS-II), PTPS-III, and PTPS-I homologs, all members of the COG0720 family, that have been previously shown to transform 7,8-dihydroneopterin triphosphate (H2NTP) into different products. PTPS-II catalyzes the formation of 6-pyruvoyltetrahydropterin in the BH4 pathway. PTPS-III catalyzes the formation of 6-hydroxylmethyl-7,8-dihydropterin in the THF pathway. PTPS-I catalyzes the formation of 6-carboxy-5,6,7,8-tetrahydropterin in the Q pathway. Genes of these three enzyme families are often misannotated as they are difficult to differentiate by sequence similarity alone. Using a combination of physical clustering, signature motif, and phylogenetic co-distribution analyses, in vivo complementation studies, and in vitro enzymatic assays, a complete reannotation of the COG0720 family was performed in prokaryotes. Notably, this work identified and experimentally validated dual function PTPS-I/III enzymes involved in both THF and Q biosynthesis. Both in vivo and in vitro analyses showed that the PTPS-I family could tolerate a translation of the active site cysteine and was inherently promiscuous, catalyzing different reactions on the same substrate, or the same reaction on different substrates. Finally, the analysis and experimental validation of several archaeal COG0720 members confirmed the role of PTPS-I in archaeosine biosynthesis, and resulted in the identification PTPS-III enzymes with variant signature sequences in Sulfolobus species. This study reveals an expanded versatility of the COG0720 family members and illustrates that for certain protein families, extensive comparative genomic analysis beyond homology is required to correctly predict function.
Queuosine; archaeosine; tetrahydrofolate; biopterin; tRNA modification; riboflavin; 6-pyruvoyl-tetrahydropterin synthase
A comparative genomic analysis of the recently sequenced human body louse unicellular endosymbiont Candidatus Riesia pediculicola with a reduced genome (582 Kb), revealed that it is the only known organism that might have lost all post-transcriptional base and ribose modifications of the tRNA body, retaining only modifications of the anticodon-stem-loop essential for mRNA decoding. Such a minimal tRNA modification set was not observed in other insect symbionts or in parasitic unicellular bacteria, such as Mycoplasma genitalium (580 Kb), that have also evolved by considerably reducing their genomes. This could be an example of a minimal tRNA modification set required for life, a question that has been at the center of the field for many years, especially for understanding the emergence and evolution of the genetic code.
tRNA; maturation; translation; modified nucleosides; comparative genomics
The availability of over 3000 published genome sequences has enabled the use of comparative genomic approaches to drive the biological function discovery process. Classically, one used to link gene with function by genetic or biochemical approaches, a lengthy process that often took years. Phylogenetic distribution profiles, physical clustering, gene fusion, co-expression profiles, structural information and other genomic or post-genomic derived associations can be now used to make very strong functional hypotheses. Here, we illustrate this shift with the analysis of the DUF71/COG2102 family, a subgroup of the PP-loop ATPase family.
The DUF71 family contains at least two subfamilies, one of which was predicted to be the missing diphthine-ammonia ligase (EC 188.8.131.52), Dph6. This enzyme catalyzes the last ATP-dependent step in the synthesis of diphthamide, a complex modification of Elongation Factor 2 that can be ADP-ribosylated by bacterial toxins. Dph6 orthologs are found in nearly all sequenced Archaea and Eucarya, as expected from the distribution of the diphthamide modification. The DUF71 family appears to have originated in the Archaea/Eucarya ancestor and to have been subsequently horizontally transferred to Bacteria. Bacterial DUF71 members likely acquired a different function because the diphthamide modification is absent in this Domain of Life. In-depth investigations suggest that some archaeal and bacterial DUF71 proteins participate in B12 salvage.
This detailed analysis of the DUF71 family members provides an example of the power of integrated data-miming for solving important “missing genes” or “missing function” cases and illustrates the danger of functional annotation of protein families by homology alone.
This article was reviewed by Arcady Mushegian, Michael Galperin and L. Aravind.
Diphthamide; Vitamin B12; Amidotransferase; Comparative genomics
KEOPS is an important cellular complex conserved in Eukarya, with some subunits conserved in Archaea and Bacteria. This complex was recently found to play an essential role in formation of the tRNA modification threonylcarbamoyladenosine (t6A), and was previously associated with telomere length maintenance and transcription. KEOPS subunits are conserved in Archaea, especially in the Euryarchaea, where they had been studied in vitro. Here we attempted to delete the genes encoding the four conserved subunits of the KEOPS complex in the euryarchaeote Haloferax volcanii and study their phenotypes in vivo. The fused kae1-bud32 gene was shown to be essential as was cgi121, which is dispensable in yeast. In contrast, pcc1 (encoding the putative dimerizing unit of KEOPS) was not essential in H. volcanii. Deletion of pcc1 led to pleiotropic phenotypes, including decreased growth rate, reduced levels of t6A modification, and elevated levels of intra-cellular glycation products.
Experimental evolution via continuous culture is a powerful approach to the alteration of complex phenotypes, such as optimal/maximal growth temperatures. The benefit of this approach is that phenotypic selection is tied to growth rate, allowing the production of optimized strains. Herein, we demonstrate the use of a recently described long-term culture apparatus called the Evolugator for the generation of a thermophilic descendant from a mesophilic ancestor (Escherichia coli MG1655). In addition, we used whole-genome sequencing of sequentially isolated strains throughout the thermal adaptation process to characterize the evolutionary history of the resultant genotype, identifying 31 genetic alterations that may contribute to thermotolerance, although some of these mutations may be adaptive for off-target environmental parameters, such as rich medium. We undertook preliminary phenotypic analysis of mutations identified in the glpF and fabA genes. Deletion of glpF in a mesophilic wild-type background conferred significantly improved growth rates in the 43-to-48°C temperature range and altered optimal growth temperature from 37°C to 43°C. In addition, transforming our evolved thermotolerant strain (EVG1064) with a wild-type allele of glpF reduced fitness at high temperatures. On the other hand, the mutation in fabA predictably increased the degree of saturation in membrane lipids, which is a known adaptation to elevated temperature. However, transforming EVG1064 with a wild-type fabA allele had only modest effects on fitness at intermediate temperatures. The Evolugator is fully automated and demonstrates the potential to accelerate the selection for complex traits by experimental evolution and significantly decrease development time for new industrial strains.
Nearly 2200 genomes encoding some 6 million proteins have now been sequenced. Around 40% of these proteins are of unknown function even when function is loosely and minimally defined as “belonging to a superfamily”. In addition to in silico methods, the swelling stream of high-throughput experimental data can give valuable clues for linking these “unknowns” with precise biological roles. The goal is to develop integrative data-mining platforms that allow the scientific community at large to access and utilize this rich source of experimental knowledge. To this end, we review recent advances in generating whole-genome experimental datasets, where this data can be accessed, and how it can be used to drive prediction of gene function.
DksA is a global transcriptional regulator that directly interacts with RNA polymerase (RNAP) and, in conjunction with an alarmone ppGpp, alters transcription initiation at target promoters. DksA proteins studied to date contain a canonical Cys4 Zn-finger motif thought to be essential for their proper folding and thus activity. In addition to the canonical DksA protein, the Pseudomonas aeruginosa genome encodes a closely-related paralog DksA2 that lacks the Zn-finger motif. Here, we report that DksA2 can functionally substitute for the canonical DksA in vivo in Escherichia coli and P. aeruginosa. We also demonstrate that DksA2 affects transcription by the E. coli RNAP in vitro similarly to DksA. The dksA2 gene is positioned downstream of a putative Zur-binding site. Accordingly, we show that dksA2 expression is repressed by the presence of exogenous Zn, deletion of Zur results in constitutive expression of dksA2, and Zur binds specifically to the promoter region of dksA2. We also found that deletion of dksA2 confers a growth defect in the absence of Zn. Our data suggest that DksA2 plays a role in Zn homeostasis and serves as a back-up copy of the canonical Zn-dependent DksA in Zn poor environments.
DksA; Pseudomonas aeruginosa; Zur; zinc
Identifying functions for all gene products in all sequenced organisms is a central challenge of the post-genomic era. However, at least 30-50% of the proteins encoded by any given genome are of unknown or vaguely known function, and a large number are wrongly annotated. Many of these ‘unknown’ proteins are common to prokaryotes and plants. We set out to predict and experimentally test the functions of such proteins. Our approach to functional prediction integrates comparative genomics based mainly on microbial genomes with functional genomic data from model microorganisms and post-genomic data from plants. This approach bridges the gap between automated homology-based annotations and the classical gene discovery efforts of experimentalists, and is more powerful than purely computational approaches to identifying gene-function associations.
Among Arabidopsis genes, we focused on those (2,325 in total) that (i) are unique or belong to families with no more than three members, (ii) occur in prokaryotes, and (iii) have unknown or poorly known functions. Computer-assisted selection of promising targets for deeper analysis was based on homology-independent characteristics associated in the SEED database with the prokaryotic members of each family. In-depth comparative genomic analysis was performed for 360 top candidate families. From this pool, 78 families were connected to general areas of metabolism and, of these families, specific functional predictions were made for 41. Twenty-one predicted functions have been experimentally tested or are currently under investigation by our group in at least one prokaryotic organism (nine of them have been validated, four invalidated, and eight are in progress). Ten additional predictions have been independently validated by other groups. Discovering the function of very widespread but hitherto enigmatic proteins such as the YrdC or YgfZ families illustrates the power of our approach.
Our approach correctly predicted functions for 19 uncharacterized protein families from plants and prokaryotes; none of these functions had previously been correctly predicted by computational methods. The resulting annotations could be propagated with confidence to over six thousand homologous proteins encoded in over 900 bacterial, archaeal, and eukaryotic genomes currently available in public databases.
The EKC/KEOPS complex is universally conserved in Archaea and Eukarya and has been implicated in several cellular processes, including transcription, telomere homeostasis and genomic instability. However, the molecular function of the complex has remained elusive so far. We analyzed the transcriptome of EKC/KEOPS mutants and observed a specific profile that is highly enriched in targets of the Gcn4p transcriptional activator. GCN4 expression was found to be activated at the translational level in mutants via the defective recognition of the inhibitory upstream ORFs (uORFs) present in its leader. We show that EKC/KEOPS mutants are defective for the N6-threonylcarbamoyl adenosine modification at position 37 (t6A37) of tRNAs decoding ANN codons, which affects initiation at the inhibitory uORFs and provokes Gcn4 de-repression. Structural modeling reveals similarities between Kae1 and bacterial enzymes involved in carbamoylation reactions analogous to t6A37 formation, supporting a direct role for the EKC in tRNA modification. These findings are further supported by strong genetic interactions of EKC mutants with a translation initiation factor and with threonine biosynthesis genes. Overall, our data provide a novel twist to understanding the primary function of the EKC/KEOPS and its impact on several essential cellular functions like transcription and telomere homeostasis.
As the molecular adapters between codons and amino acids, transfer-RNAs are pivotal molecules of the genetic code. The coding properties of a tRNA molecule do not reside only in its primary sequence. Posttranscriptional nucleoside modifications, particularly in the anticodon loop, can modify cognate codon recognition, affect aminoacylation properties, or stabilize the codon-anticodon wobble base pairing to prevent ribosomal frameshifting. Despite a wealth of biophysical and structural knowledge of the tRNA modifications themselves, their pathways of biosynthesis had been until recently only partially characterized. This discrepancy was mainly due to the lack of obvious phenotypes for tRNA modification–deficient strains and to the difficulty of the biochemical assays used to detect tRNA modifications. However, the availability of hundreds of whole-genome sequences has allowed the identification of many of these missing tRNA-modification genes. This chapter reviews the methods that were used to identify these genes with a special emphasis on the comparative genomic approaches. Methods that link gene and function but do not rely on sequence homology will be detailed, with examples taken from the tRNA modification field.
Like other forms of engineering, metabolic engineering requires knowledge of the components (the ‘parts list’) of the target system. Lack of such knowledge impairs both rational engineering design and diagnosis of the reasons for failures; it also poses problems for the related field of metabolic reconstruction, which uses a cell’s parts list to recreate its metabolic activities in silico. Despite spectacular progress in genome sequencing, the parts lists for most organisms that we seek to manipulate remain highly incomplete, due to the dual problem of ‘unknown’ proteins and ‘orphan’ enzymes. The former are all the proteins deduced from genome sequence that have no known function, and the latter are all the enzymes described in the literature (and often catalogued in the EC database) for which no corresponding gene has been reported. Unknown proteins constitute up to about half of the proteins in prokaryotic genomes, and much more than this in higher plants and animals. Orphan enzymes make up more than a third of the EC database. Attacking the ‘missing parts list’ problem is accordingly one of the great challenges for post-genomic biology, and a tremendous opportunity to discover new facets of life’s machinery. Success will require a co-ordinated community-wide attack, sustained over years. In this attack, comparative genomics is probably the single most effective strategy, for it can reliably predict functions for unknown proteins and genes for orphan enzymes. Furthermore, it is cost-efficient and increasingly straightforward to deploy owing to a proliferation of databases and associated tools.
comparative genomics; metabolic reconstruction; orphan enzyme; pathway hole; unknown protein
With the availability of a genome sequence and increasingly sophisticated genetic tools, Haloferax volcanii is becoming a model for both Archaea and halophiles. In order for H. volcanii to reach a status equivalent to Escherichia coli, Bacillus subtilis, or Saccharomyces cerevisiae, a gene knockout collection needs to be constructed in order to identify the archaeal essential gene set and enable systematic phenotype screens. A streamlined gene-deletion protocol adapted for potential automation was implemented and used to generate 22 H. volcanii deletion strains and identify several potentially essential genes. These gene deletion mutants, generated in this and previous studies, were then analyzed in a high-throughput fashion to measure growth rates in different media and temperature conditions. We conclude that these high-throughput methods are suitable for a rapid investigation of an H. volcanii mutant library and suggest that they should form the basis of a larger genome-wide experiment.
Tetrahydromonapterin is a major pterin in Escherichia coli and is hypothesized to be the cofactor for phenylalanine hydroxylase (PhhA) in Pseudomonas aeruginosa, but neither its biosynthetic origin nor its cofactor role has been clearly demonstrated. A comparative genomics analysis implicated the enigmatic folX and folM genes in tetrahydromonapterin synthesis via their phyletic distribution and chromosomal clustering patterns. folX encodes dihydroneopterin triphosphate epimerase, which interconverts dihydroneopterin triphosphate and dihydromonapterin triphosphate. folM encodes an unusual short-chain dehydrogenase/reductase known to have dihydrofolate and dihydrobiopterin reductase activity. The roles of FolX and FolM were tested experimentally first in E. coli, which lacks PhhA and in which the expression of P. aeruginosa PhhA plus the recycling enzyme pterin 4a-carbinolamine dehydratase, PhhB, rescues tyrosine auxotrophy. This rescue was abrogated by deleting folX or folM and restored by expressing the deleted gene from a plasmid. The folX deletion selectively eliminated tetrahydromonapterin production, which far exceeded folate production. Purified FolM showed high, NADPH-dependent dihydromonapterin reductase activity. These results were substantiated in P. aeruginosa by deleting tyrA (making PhhA the sole source of tyrosine) and folX. The ΔtyrA strain was, as expected, prototrophic for tyrosine, whereas the ΔtyrA ΔfolX strain was auxotrophic. As in E. coli, the folX deletant lacked tetrahydromonapterin. Collectively, these data establish that tetrahydromonapterin formation requires both FolX and FolM, that tetrahydromonapterin is the physiological cofactor for PhhA, and that tetrahydromonapterin can outrank folate as an end product of pterin biosynthesis.
GTP cyclohydrolase I (GCYH-I) is an essential Zn2+-dependent enzyme that catalyzes the first step of the de novo folate biosynthetic pathway in bacteria and plants, the 7-deazapurine biosynthetic pathway in Bacteria and Archaea, and the biopterin pathway in mammals. We recently reported the discovery of a new prokaryotic-specific GCYH-I (GCYH-IB) that displays no sequence identity to the canonical enzyme and is present in ∼25% of bacteria, the majority of which lack the canonical GCYH-I (renamed GCYH-IA). Genomic and genetic analyses indicate that in those organisms possessing both enzymes, e.g., Bacillus subtilis, GCYH-IA and -IB are functionally redundant, but differentially expressed. Whereas GCYH-IA is constitutively expressed, GCYH-IB is expressed only under Zn2+-limiting conditions. These observations are consistent with the hypothesis that GCYH-IB functions to allow folate biosynthesis during Zn2+ starvation. Here, we present biochemical and structural data showing that bacterial GCYH-IB, like GCYH-IA, belongs to the tunneling-fold (T-fold) superfamily. However, the GCYH-IA and -IB enzymes exhibit significant differences in global structure and active-site architecture. While GCYH-IA is a unimodular, homodecameric, Zn2+-dependent enzyme, GCYH-IB is a bimodular, homotetrameric enzyme activated by a variety of divalent cations. The structure of GCYH-IB and the broad metal dependence exhibited by this enzyme further underscore the mechanistic plasticity that is emerging for the T-fold superfamily. Notably, while humans possess the canonical GCYH-IA enzyme, many clinically important human pathogens possess only the GCYH-IB enzyme, suggesting that this enzyme is a potential new molecular target for antibacterial development.
The bacterial elongation factor P (EF-P) is strictly conserved in bacteria and essential for protein synthesis. It is homologous to the eukaryotic translation initiation factor 5A (eIF5A). A highly conserved eIF5A lysine is modified into an unusual amino acid derived from spermidine, hypusine. Hypusine is absolutely required for eIF5A's role in translation in Saccharomyces cerevisiae. The homologous lysine of EF-P is also modified to a spermidine derivative in Escherichia coli. However, the biosynthesis pathway of this modification in the bacterial EF-P is yet to be elucidated.
Presentation of the Hypothesis
Here we propose a potential mechanism for the post-translational modification of EF-P. By using comparative genomic methods based on physical clustering and phylogenetic pattern analysis, we identified two protein families of unknown function, encoded by yjeA and yjeK genes in E. coli, as candidates for this missing pathway. Based on the analysis of the structural and biochemical properties of both protein families, we propose two potential mechanisms for the modification of EF-P.
Testing the hypothesis
This hypothesis could be tested genetically by constructing a bacterial strain with a tagged efp gene. The tag would allow the purification of EF-P by affinity chromatography and the analysis of the purified protein by mass spectrometry. yjeA or yjeK could then be deleted in the efp tagged strain and the EF-P protein purified from each mutant analyzed by mass spectrometry for the presence or the absence of the modification. This hypothesis can also be tested by purifying the different components (YjeK, YjeA and EF-P) and reconstituting the pathway in vitro.
Implication of the hypothesis
The requirement for a fully modified EF-P for protein synthesis in certain bacteria implies the presence of specific post-translational modification mechanism in these organisms. All of the 725 bacterial genomes analyzed, possess an efp gene but only 200 (28%) possess both yjeA and yjeK genes. In the other organisms, EF-P may be modified by another pathway or the translation machinery must have adapted to the lack of EF-P modification. Our hypotheses, if confirmed, will lead to the discovery of a new post-translational modification pathway.
This article was reviewed by Céline Brochier-Armanet, Igor B. Zhulin and Mikhail Gelfand. For the full reviews, please go to the Reviewers' reports section.