The genome of N. europaea ATCC 19718 consists of a single circular chromosome of 2,812,094 bp (the general features of the genome are listed in Table , and a detailed map is shown in Fig. ); nucleotide 1 was assigned at the predicted origin of replication. The GC skew analysis also reveals that the genome is clearly divided into two unequal replichores (roughly 1/3 to 2/3). The mechanism and biological reason for this asymmetry is unclear but may be a result of the abnormally large amount of repetitive material within the genome and presumed fluidity of the genome. One clear effect of this bias is an ~1% difference (25.84% versus 24.87%) in G versus C strand composition. Another peculiarity is that the multiple copies of genes participating in ammonia oxidation (amoCAB and hao), along with a large (7.5 kb) recently duplicated region, are all on the larger replichore (see Fig. ). Overall, the N. europaea genome is 50.7% G+C. Although several spikes are observed, these are not postulated to have been acquired via lateral transfer. However, some of these G+C spikes do correlate with GC skew spikes. Interestingly, some of these spikes also correspond to the regions containing repeated genes and operons such as the hao gene region (HAO and associated cytochromes), amoCAB (AMO operon), and tufB (elongation factor Tu), all of which are themselves associated with several ribosomal proteins (not shown).
FIG. 1. Circular representation of the 2,812,094-bp genome of N. europaea ATCC 19718. The outer two circles represent protein-encoding and structural-RNA genes, plus and minus strand (green, energy metabolism; red, DNA replication; magenta, transcription; yellow, (more ...)
Genes are distributed evenly around the genome, with ~47% transcribed from the forward strand and ~53% from the complementary strand. A total of 2,460 protein-encoding genes emerged from the modeling effort, averaging 1,011 bp in length, with intergenic regions averaging 117 bp. These open reading frames (ORFs) account for 2,487,261 nucleotides of coding sequence (88.4%). An additional 113 ORFs are fragmentary, frameshifted, or interrupted by insertion sequence (IS) elements; these have been designated pseudogenes. Of the 2,460 putative proteins, 2,147 matched a sequence in the NR database with an e-score of <1e-5; of these, 1,863 have similarity to a protein with a functional assignment, and 285 match a protein of unknown function. An additional 312 (13%) are unique to N. europaea. Roughly 75% of the predicted proteins have the potential to be assigned a function. Other searches give similar results: 1,737 proteins match InterPro profiles, 1,967 match a Pfam hmm profile (default threshold), and 1960 can be assigned to a COG group (3BeTs). In addition to protein-encoding genes, we identified 41 tRNAs, representing all 20 amino acids. The only rRNA operon in this strain is of the 16S-Ala tRNATGC-Ile tRNAGAT-23S-5S type and contains the typical I-CeuI endonuclease site in the gene for the 23S rRNA.
Gene homolog distribution.
The sequence of this third β-proteobacterium makes a fine addition to the repertoire of structural diversity and content observed in the genomes sequenced thus far. A range of organisms (>130) were represented with top BLAST hits to one or more genes. Ralstonia solanacearum, one of two β-proteobacterial genome sequences currently in the public databases, was most often the top BLAST hit (31%, 768 of the 2,460 predicted ORFs). Neisseria meningitidis, the other sequenced β-proteobacterium, was the top hit with 5% of the genes. A surprising 13% had, as the top hit, Pseudomonas aeruginosa, a primarily soil-dwelling γ-proteobacterium. Other proteobacteria, including common soil inhabitants, were often found at the top of BLAST lists. Genes from E. coli, Caulobacter crescentus, Sinorhizobium meliloti, Vibrio cholerae, Xylella fastidiosa, and Mesorhizobium loti were frequent top BLAST hits with matches to 4.0, 2.9, 2.8, 2.6, 2.2, and 1.9% of N. europaea genes, respectively. At least 50 genes were also identified with each of the following classes of microorganisms: cyanobacteria and gram-positive actinobacteria and bacilli. The broad distribution of BLAST hits may reflect the few previously completed genome sequences from members of the β-proteobacteria or from any other ammonia-oxidizing bacteria.
Hirota et al. (36
) conducted pulsed-field gel electrophoresis experiments to localize both copies of amoCAB
and all three copies of hao
to a single 487-kb fragment of DNA in Nitrosomonas
sp. strain ENI-11. Each copy of amo
was found to be within 15 or 23 kb of a copy of hao
, based on restriction maps followed by long-range PCR for Nitrosomonas
sp. strain ENI-11. The genome sequence of N. europaea
ATCC 19718 reveals nearly identical proximity (15.5 and 23.1 kb) of these two important ammonia oxidation gene clusters to a copy of hao
. The genes between each amo
cluster identified in Nitrosomonas
sp. strain ENI-11 are also the same in N. europaea
. These include the genes encoding threonyl tRNA synthetase (thrS
), initiation factor 3 (infC
), ribosomal protein L20 (rplT
), phenylalanyl tRNA synthetase α and β subunits (pheS
) in the 15.5-kb intergenic spacer region and the RNA polymerase β and β′ subunits (rpoBC
) in the 23.1-kb span. The complete 23.1-kb spanning region includes tRNA genes, several ribosomal genes, elongation factors G (fusA
) and Tu (tufB
), and the transcription anti-termination gene nusG
. Both genomes also have a third hao
gene copy that is located approximately 300 kb upstream of the amo/hao
cluster with the 23-kb spanning region which suggests two similar amo
gene arrangements in these two Nitrosomonas
strains (Fig. ). However, these clusters are separated by a strikingly different spanning region (87 kb in ENI-11 versus 1,300 kb in N. europaea
). Thus, although the overall arrangement of two amo/hao
gene clusters is conserved, dramatic rearrangements have nonetheless occurred since these two Nitrosomonas
FIG. 2. Arrangement of hao and amo in N. europaea and Nitrosomonas sp. strain ENI-11. Arrows indicate the orientation of the genes. The nomenclature of the genes in N. europaea is with respect to their KpnI fragment sizes for hao (38) and EcoRI fragment sizes (more ...) Complex repetitive sequences.
Like most ammonia-oxidizing bacteria examined, N. europaea has multiple copies of the genes coding for AMO, HAO, and cytochrome c554 (Table ). The question of whether other genes, and their associated functions, might also be duplicated has long been speculated. In contrast to several other autotrophs, the genes coding for ribulose 1,5-bisphosphate carboxylase/oxygenase (RubisCo) are not duplicated. The ribosomal genes also are not duplicated as they are in many other bacteria. One of the two copies of another duplicated gene (tufB), encoding elongation factor Tu, has been found associated with one of the two amo/hao gene clusters (with the 23-kb spanning region [see Fig. and ]).
A perfect tandemly repeated region, a possible example of a recent duplication event, may give us insight into the mechanism of gene duplication and evolution and provides another example that gene duplication in N. europaea is not limited to genes involved with ammonia catabolism. This 7.5-kb duplication was found to carry genes for phosphoenolpyruvate synthase (ppsA; NE2359 and NE2366), glutaminyl tRNA synthetase (glnS; NE2356 and NE2363), along with three conserved hypothetical proteins, as well as portions of the genes for lysyl tRNA synthetase (lysS; NE2361) and aspartate aminotransferase (NE2362). These two likely “pseudogenes” have full-length representatives at the left and right borders of this most obvious duplication event. A second, different type of complex tandem repeat is composed of 15 copies of an ~339-bp degenerate repeat (nine forms) interspersed with four identical copies of a second repeat type (317 bp). This 6.3-kb repeat region lies within (and preserves throughout its length) an ORF (NE0161) encoding a large 3,064-amino-acid conserved hypothetical protein with a putative hemolysin-type calcium-binding region.
Besides the abundance of duplicated (presumably) native genomic material, the genome contains 85 predicted IS elements (most are complete elements) representing eight different families not previously described (see Table ). Although these elements appear randomly distributed around the genome (Fig. ), closer inspection reveals that there are seven IS-free islands between 100 and 200 kb in size, whereas the genome in its entirety averages one IS per 33 kb. The three least prominent IS families (ISne6, ISne7, and ISne8) do in fact appear randomly distributed, but the IS families with 9 to 27 members appear to have preferentially inserted in the vicinity of other IS elements (or alternatively served as attractants for subsequent IS integrations). The ISne1 and ISne3 classes displayed only a relatively random integration pattern, with half of the members being in close (<2 kb) proximity to other IS elements. Although ISne4 and ISne5 encode similar transposases and share some level of identity, they are not found near one another. Rather, four of the ten ISne4 are proximal to ISne1 integrations, and another two are close to ISne2 sites, whereas five of the nine ISne5 are very close to an ISne2 integration site. In fact, four of these are directly adjacent to ISne2 elements and, surprisingly, in three different orientations, indicating completely independent integration events. ISne2 itself is <2 kb away from other IS elements in 11 of its 16 members. The reason for this striking integration bias is unknown. Once a function is lost due to IS integration, if this mutation is not deleterious to the organism, further integrations in this gene would not be lethal and would be maintained in the population. This mechanism of acquiring integrations could be extended to subsequent IS integrations in the surrounding region if nearby genes participate in the same function as the gene with the first IS integration and are, therefore, rendered functionless by that integration event.
A different class of duplicated genes includes likely members of a recently diverged paralogous family that codes for or regulates production of Fe siderophore receptors. Several classes of siderophore receptors, along with regulatory genes, were identified in the genome. Together, all of the repeated elements within this genome constitute ~5% of the nucleic acid sequence, which ranks as one of the most densely populated bacterial genomes in terms of complex repetitive DNA.
The use of ammonia as an energy source, an example of lithotrophy, requires the ability to catabolize ammonia and to generate reductant for biosynthesis and to generate a chemiosmotic gradient to drive ATP synthesis (Fig. ). The genes coding for AMO (amoCAB
), HAO (hao
), and cytochromes c554
were previously sequenced (7
) and confirmed with the genome sequence. No additional genes were identified that might be involved in the oxidation of ammonia to nitrite. Electrons from hydroxylamine oxidation flow through cytochrome c554
and cytochrome cm552
into the electron transport chain at the level of ubiquinone (Fig. ) (79
). Genes for a typical ubiquinone-cytochrome c
oxidoreductase and a cytochrome aa3
-type cytochrome c
oxidase are present; the soluble monoheme cytochrome c552
is thought to mediate between the two enzymes. The gene for cytochrome c552
is not located in an energetics-related gene cluster. Contributions to the proton gradient include scalar protons that are products of the HAO reaction and proton translocation by the bc1
complex and the terminal oxidase. Genes for all subunits of a typical ATP synthase are present. Reduction of NAD+
requires proton-driven reverse electron flow, presumably through NADH-ubiquinone oxidoreductase which is encoded in a discrete gene cluster. Reduction of NADP+
may be carried out by the proton gradient-dependent NAD+
transhydrogenase also encoded in a gene cluster. The energy cost of reduction of NADP+
is thus very high. Two gene clusters contribute an Na+
-dependent NADH-ubiquinone reductase and an Na+
antiporter, which may facilitate some Na+
-driven secondary transporters and also may be significant in marine environments.
FIG. 3. Diagram of N. europaea cell. Processes primarily associated with the generation of a proton gradient are indicated on the bottom, and processes associated with utilization of the gradient are on the other sides. ATP and NADH production are indicated on (more ...)
The genome reveals that N. europaea has a relatively limited number of optional paths to terminal electron acceptors. Only one type of terminal oxidase of the aa3 family is present. The only other potential terminal oxidase is the soluble, periplasmic copper enzyme noted below. The apparent presence of a membrane-anchored cytochrome c4 suggests that electrons from ubiquinol may, at times, bypass the bc1 complex on their way to cytochrome oxidase.
is incapable of reducing nitrate but is capable of reducing nitrite with the formation of nitric and nitrous oxide but not dinitrogen (reviewed in reference 41
). No full ORFs were identified with strong similarity to known dissimilatory nitrate reductases (EC 18.104.22.168) or nitrous oxide reductases (EC 22.214.171.124), a finding consistent with the biochemical evidence. A cluster of genes encodes periplasmic proteins seemingly related to the transfer of electrons to and reduction of nitrite and/or oxygen (Fig. ). The first gene encodes an aerobically expressed, soluble “blue-copper oxidase” (24
). The next two genes encode a soluble, monoheme c
cytochrome and a diheme c
cytochrome, respectively. The fourth gene in this cluster has the best match to the aniA
gene of Neisseria gonorrhoeae
encoding an inducible nitrite reductase (PAN1) (verified by mutational inactivation of nitrite reduction) (56
). The putative nitrite reductases from other Neisseria
spp. and N. europaea
all have significant similarities to conserved domains from copper-containing nitrite reductases from several denitrifying bacteria but form a separate clade in the phylogeny of nirK
. A signal sequence of 24 amino acids is predicted. N. europaea
appears to have a distinct form of nirK
versus other ammonia oxidizers from the β subdivision (17
). Disruption of the putative nirK
did not lead to a loss of nitrous oxide production but did result in an increased sensitivity to nitrite (5
). The NO reducing system is encoded in a nor
gene cluster (norCBQD
; NE2003, NE2004, NE2005, and NE2006) with an organization similar to that found in Pseudomonas
sp. strain G-179 (6
). Anaerobic metabolism of N. europaea
was reported with pyruvate as the reductant and nitrite as the terminal electron acceptor (1
). The genome encodes the enzymes necessary for this process as mediated by pyruvate dehydrogenase and possibly the citric acid cycle. Under these conditions, electrons apparently pass to nitrite reductase by way of NADH-ubiquinone reductase. Anaerobically, Nitrosomonas eutropha
and N. europaea
are reported to oxidize H2
and reduce nitrite, although the rate was much higher in N. eutropha
). A gene for hydrogenase was not identified in the genome of N. europaea
FIG.4. N. europaea gene clusters. Several gene clusters described in the text are diagrammed. Each arrow represents a gene in the cluster. The N. europaea gene numbers are above each arrow and the gene names (or other identifiers) are below each arrow. In panel (more ...)
Assimilation of carbon dioxide is initiated by a type I RubisCo (Fig. ). The genes for this enzyme are most similar to those from Acidithiobacillus ferroxidans
, another obligate lithoautotroph. A carbonic anhydrase gene (cynT
) is next to a gene for an anion transporter and only 4.6 kb from the RubisCo genes (Fig. ). If this transporter is for carbonate or bicarbonate, then it and carbonic anhydrase would promote accumulation of CO2
, the substrate for RubisCo. Genes for three additional carbonic anhydrases were also identified. CO2
-repressible carbonic anhydrase activity has been observed in Nitrosomomas
). Ralstonia eutropha
H16, upon the inactivation of the gene encoding carbonic anhydrase, was unable to grow at ambient CO2
). N. europaea
does not have genes associated with the production of carboxysomes (16
With the exception of two enzymes, genes for all enzymes to complete the Calvin-Benson-Bassham cycle are present. A gene for sedoheptulose 1,7-bis-phosphatase (EC 126.96.36.199) is absent. However, the fructose 1,6-bis-phosphatase (EC 188.8.131.52) encoded by NE0521 may have higher activity with sedoheptulose 1,7-bis-phosphate and may function primarily in its hydrolysis and not in gluconeogenesis (see below). This was found to be the case with the highly similar enzymes from Ralstonia metallidurans
(formerly Alcaligenes eutrophus
) and Xanthobacter flavus
). The gene encoding NADPH-dependent glyceraldehyde-3-phosphate dehydrogenase (EC 184.108.40.206) is also absent and is apparently replaced by a gene for the NADH-dependent enzyme (EC 220.127.116.11) “borrowed” from and also used by gluconeogenesis and glycolysis, as has been observed in other chemoautotrophs (67
). Since during growth of this obligate autotroph it is unlikely that there is an extended period of time when both CO2
fixation and gluconeogenesis are not occurring, there is little advantage in having two separate genes that can be independently regulated. In fact, energy is conserved by having only one enzyme. Metabolic regulation at the enzyme level can still provide the appropriate flux of gluconeogenesis and glycolysis.
A very significant energy savings is achieved by the use of NADH rather than NADPH (generated from NADH in an energy-dependent reaction) to reduce 3-phosphoglycerate. It is interesting that an obligate autotroph, lacking catabolic pathways, had no reason to adopt a system of metabolic regulation dependent on the separate use of NAD+ or NADP+ as a redox mediator and effector in catabolic or anabolic pathways, respectively, as is seen in the Eukaryota. It continues to use both NADH and NADPH for biosynthesis.
Genes for the enzymes common to gluconeogenesis and glycolysis were present. From an energetic point of view it seems likely that the flux through either pathway is limited to biosynthetic requirements or recycling of fixed carbon. As noted above, the gluconeogenic enzyme fructose 1,6-bis-phosphatase (EC 18.104.22.168) likely has a substrate specificity suitable for a role in the hydrolysis of sedoheptulose 1,7-bis-phosphate. The hydrolysis of fructose 1,6-bis-phosphate may be carried out by a pyrophosphate-dependent 6-phosphofructokinase (EC 22.214.171.124), i.e., fructose 1,6-bis-phosphate + Pi → fructose-6-phosphate + PPi. Notably, the adjacent gene encodes a pyrophosphatase whose action would “pull” gluconeogenesis. The same pyrophosphate-dependent 6-phosphofructokinase, which is reversible, may also catalyze the production of fructose 1,6-bis-phosphate in glycolysis (participating in a pyrophosphate-dependent energy economy). Genes with high sequence similarities to ATP-dependent 6-phosphofructokinases (EC 126.96.36.199) were not found.
Genes for all enzymes of the tricarboxylic acid (TCA) cycle were identified. The predominant theory for the basis of obligate autotrophy holds that the TCA cycle is incomplete in these organisms (68
). Indeed, for many autotrophs, including N. europaea
), α-ketoglutarate dehydrogenase activity was not detected or was very low, in contrast to other activities of the TCA cycle. However, genes for subunits E1, E2, and E3 (which is shared by pyruvate dehydrogenase) are present. It remains to be seen whether the genes for α-ketoglutarate dehydrogenase are expressed.
does not synthesize glycogen or β-hydroxybutyrate as storage products but does accumulate polyphosphate when growth is limited by low values of pH (72
). The gene for polyphosphate kinase (NE0323) found in the genome has a very high sequence similarity to the equivalent gene in N. meningitidis
Biochemical evidence suggests that ammonia is assimilated via glutamate dehydrogenase (40
), and this pathway is consistent with the presence of a gene coding for an NADPH-specific glutamate dehydrogenase. Also consistent with the biochemical evidence (11
), a glnA
homolog, which encodes glutamine synthetase, is present. Although a putative glnE
, which encodes the glutamine synthetase adenylating enzyme, was found, homologs for glnD
(PII uridyl transferase), glnB
(PII), or glnK
(alternative PII) were not identified. The absence of a gene encoding a PII protein is surprising given the broad distribution of this regulatory protein. Glutamate synthase activity was not observed previously (11
). Although a gene with similarity to glutamate synthase is present, the gene is truncated and appears to encode only the domains that transfer ammonia to α-ketoglutarate and accept and transfer electrons from a donor to the reductive amination domain. The domain responsible for the hydrolysis of ammonia from glutamine was not identified in this truncated gene or elsewhere in the genome. The gene profile supports the biochemical evidence that ammonia is assimilated via glutamate dehydrogenase, whereas the role of glutamine synthetase is to produce glutamine.
An ammonium transporter is present, which at low pH may supplement the passive uptake of ammonium. Nitrosomonas
is reported to assimilate nitrite-N but not nitrate-N (62
). The observed assimilation of nitrite-N is presumed to involve the siroheme-containing sulfite or nitrite reductase that is encoded in the genome. It is interesting that this frugal autotroph would expend reducing power on the production of ammonium.
Genes encoding the classical urease (e.g., ureABC
coding for urea amidohydrolase) are not present. A candidate for urea metabolism in N. europaea
is a variation of urea amidolyase. However, the gene in N. europaea
is shorter than the one found in S. cerevisiae
(a well studied system; [15
]) and appears to contain only the carboxylase function of this bifunctional enzyme, and not the hydrolase/amidase function. No good homologs for the hydrolase/amidase were identified. To our knowledge, there are no reports indicating that urea can support the growth of N. europaea
Autotrophy requires the ability to synthesize most, and in the case of N. europaea, all required small molecules and macromolecules from inorganic constituents. The gene profile of N. europaea is consistent with this requirement. In general, most and often all of the genes needed for particular biosynthetic pathways can be identified. Fatty acid and lipid synthesis, production of cofactors and prosthetic groups, nucleic acid synthesis, and amino acid synthesis can all be accounted for based on the gene profile. For example, similarity searches revealed many genes for the synthesis of purines and pyrimidines. For purine synthesis, genes were present for de novo synthesis of adenyl and guanisyl phosphates and their deoxy derivatives. Of the 23 steps required for the synthesis of ATP, dATP, GTP, and dGTP from ribose-5-phosphate, genes for all of the steps were identified. In the case of pyrimidine synthesis, complete pathways for synthesis from carbamoyl phosphate to UTP, dTTP, CTP, and dCTP were identified.
The genes for enzymes needed to synthesize fatty acids up to hexadecanate from acetyl coenzyme A (acetyl-CoA) were identified. N. europaea
has three 3-ketoacyl-ACP synthases that catalyze the initial condensation reaction between acetyl-CoA and malonyl-CoA. Both synthase I (encoded by fabB
) and synthase II (encoded by fabF
) can elongate saturated fatty acids; however, only synthase I can catalyze the synthesis of unsaturated fatty acids. Synthase III (encoded by fabH
) is involved with branched-chain fatty acid biosynthesis. As in other bacteria, several genes involved in fatty acid synthesis in N. europaea
are arranged in an operon containing fabF
(acyl carrier protein), fabG
(3-ketoacyl-ACP reductase), fabD
(malonyl-CoA ACP transacylase), fabH
, and plsX
(undefined role in fatty acid biosynthesis). No match for enoyl-ACP hydratase was found in the genome. However, a 3-hydroxymyristoyl/ 2-hydroxydecanoyl ACP hydratase was found (encoded by fabZ
) that has a broad substrate range, including both short-chain and saturated and unsaturated long-chain fatty acids (35
). Genes for the synthesis of squalene from dimethylallyl-PP and isopentenyl-PP were identified. In addition, the gene for the branch point for hopane synthesis from squalene, squalene-hopene synthase, is present. However, the gene for the previous enzymatic step, squalene epoxidase, was not found.
Genes for amino acid synthesis are among the most conserved pathways and allow for comparisons of gene organization among organisms. Genes for the synthesis of aromatic amino acids and histidine illustrate this point (Fig. ). The synthesis of Phe, Tyr, and Trp from phosphoenolpyruvate and erythrose-4-phosphate is via the shikimic acid pathway. Genes for all enzymes required for chorismate synthesis and the tryptophan branch were identified. Tryptophan is synthesized from chorismate via anthralinate. All elements of the pathway were identified and are found in three clusters in the genome. Anthranilate synthase component I (EC 188.8.131.52; encoded by trpE) and anthranilate phosphoribosyltransferase (EC 184.108.40.206; encoded by trpD) likely encode a bifunctional enzyme. N-(5′-Phosphoribosyl) anthranilate isomerase (EC 220.127.116.11) (encoded by trpF) and indole-3-glycerolphosphate synthase EC 18.104.22.168 (encoded by trpC) also likely encode a bifunctional enzyme. Tryptophan synthase α (EC 22.214.171.124; encoded by trpA) and β (encoded by trpB) chains are contiguous.
Phenylalanine could be produced via the nonarogenate branch from chorismate to phenylpyruvate by a dual function chorismate mutase-prephenate dehydratase (P protein) (NE0335). An amino transferase (EC 126.96.36.199; NE0336) would convert phenylpyruvate to phenylalanine. The P protein will apparently also produce free prephenate. An aspartate amino transferase (EC 188.8.131.52) could presumably transform prephenate to arogenate. No aromatic specific aminotransferase (EC 184.108.40.206) was identified, but HisC2 has high similarity to Pseudomonas stutzeri
aminotransferase with a broad specificity in the biosynthesis of histidine, phenylalanine, or tyrosine. In N. europaea
, the prephenate dehydratase of the P protein is specific and does not metabolize arogenate (71
). Therefore, arogenate is a precursor of tyrosine but not phenylalanine. This observation was supported by the pathway identified in the genome sequence.
Tyrosine synthesis occurs via prephenate which is produced by the P protein. An aminotransferase then forms arogenate (see above). Arogenate dehydrogenase function may be included in another dual-function enzyme combining arogenate dehydrogenase (EC 220.127.116.11) and prephenate dehydrogenase activity (EC 18.104.22.168) (TyrAc
). However, Subramaniam et al. (71
) observed that this enzyme was arogenate and NADP+
specific in N. europaea.
Therefore, arogenate would be an obligatory intermediate in tyrosine synthesis and 4-hydroxyphenylpyruvate would not be formed. Subramaniam et al. (71
) have shown that tyrosine is required for the prephenate dehydratase activity and that phenylalanine is an inhibitor, thus assuring a balanced production of phenylalanine and tyrosine. For tyrosine to regulate phenylalanine synthesis, two separate paths must be present (after prephenate). Since the aspartate aminotransferase is probably promiscuous (for phenylpyruvate and prephenate) this requires P protein to be specific for prephenate (versus arogenate). Subramaniam et al. (71
) observed that the use of arogenate as a precursor for tyrosine and phenylpyruvate for phenylanine is characteristic of the cyanobacteria and coryneform bacteria.
The histidine biosynthetic operon in N. europaea
exhibits overall organization (Fig. ) similar to that seen in N. meningitidis
and E. coli
, with some exceptions that help to place gene fusion events along the evolutionary lines of the proteobacteria. In N. europaea
, the majority of the his
genes are contiguous; however, hisDG
genes are separated from the rest of the operon. The hisI
genes are not fused in N. europaea
, but they are adjacent genes whose ORFs overlap. In the enteric bacteria (γ-proteobacteria) examined to date, the hisIE
gene product is a bifunctional enzyme. Examples of monofunctional enzymes encoded by hisI
are commonly found in the β subdivisions. This observation would indicate that the hisIE
gene fusion occurred after the enteric proteobacteria split from these subdivisions. The hisB
gene of N. europaea
is predicted to encode imidazole glycerol phosphate dehydratase (EC 22.214.171.124), while the histidinol phosphatase (EC 126.96.36.199) is likely encoded by a separate gene (NE1185) outside of the operon, similarly to that found in R. solanacearum
). This observation confirms that the fusion of these two activities occurred after the evolutionary split separating the γ subdivision from the other subdivisions of the proteobacteria (29
). HitA matches with a nucleotide-binding protein, similar to members of the HIT (histidine triad) family. A second putative hisC
gene (NE0647) encoding an aminotransferase was identified outside of the his
operon structure. However, this gene may be involved in the biosynthesis of other aromatic amino acids (see above).
The pathways for glycine, serine, and threonine synthesis are identified and are the presumed to be starting materials for the synthesis of the osmoprotectants betaine and glycine betaine. The betA gene encoding EC 188.8.131.52 in the osmoregulatory choline-glycine betaine pathway was identified (NE1237), although other genes found in the bet operon of proteobacteria were not found in close proximity. A possible ABC transport system for osmoprotectants may be encoded for by a polyamine transport operon similar to potABCD in P. aeruginosa (NE1870 to NE1873).
Genes for aminoacyl tRNA synthetases for all amino acids are present. Glutamyl-tRNA synthetase is encoded by gltX
(NE1624). The duplicate glnS
genes (NE2356 and NE2363) likely encode a specific glutaminyl-tRNA synthetase rather than the indiscriminate glutaminyl/glutamyl-tRNA synthetase. The absence of a complete gatCAB
operon encoding the Glu-tRNA amidotransferase further supports the glutamine specificity for glnS
. The presence of both glnS
may be the result of an earlier gene duplication event and subsequent functional divergence (14
Catabolism of organic compounds.
In contrast to the genes for the biosynthesis of cellular constituents, genes for the catabolism of organic compounds are scant. For example, no genes for the degradation of either purine or pyrimidine nucleosides were identified. Salvage pathways for nucleotides were present and included genes for DNA exo- and endonucleases and ribonucleases. Nucleoside salvage appeared to be limited to uracil and thymidine and was nonexistent for the nucleobases. Likewise, complete pathways for the catabolism of most amino acids were not identified. Where a few genes were present, these genes were most often also required for processes other than catabolism. For a few of the simpler amino acids (e.g., aspartate, serine, and glycine), for which transamination would result in an intermediate in a primary pathway such as the TCA cycle, pathways for catabolism could be envisioned based on the gene profile. Likewise, as discussed above, genes for complete glycolytic pathways and the TCA cycle are present, suggesting that complete oxidation of simple sugars and organic acids should be possible.
A notable exception to the dearth of genes for catabolic enzymes is seen with fatty acid oxidation since genes for all of the enzymes required for fatty acid oxidation are present. As in E. coli and other bacteria, many of the activities of fatty acid oxidation are contained in two subunits of a multienzyme complex. The 3-hydroxyacyl-CoA dehydrogenase, enoyl-CoA hydratase, cis-Δ 3-trans-Δ2-enoyl-CoA isomerase, and the 3-ketoacyl-CoA epimerase are all contained in one subunit (encoded by fadA). The 3-keto-CoA thiolase is associated with a second subunit (encoded by fadB). As in other bacteria, the fadA and fadB genes are adjacent. Although N. europaea does have phospholipase D, which cleaves the head group from phospholipids, the genes for phospholipase A1 and A2 (enzymes that remove the fatty acids) and phospholipase C were not present, suggesting that N. europaea is not able to degrade phospholipids.
Transport systems. N. europaea
has an array of active (primary and secondary) transporters (i.e., ion-coupled, ATP hydrolysis, or redox-driven transporters). Approximately 285 ORFs in its genome (11.5%) are dedicated to the active transport of molecules across its membranes. Other gram-negative bacteria have 3 to 12% of the ORFs in their genome encoding proteins involved in transport (78
). The number of predicted ABC transporters in N. europaea
, is similar to the numbers in other lithoautotrophic bacteria but smaller than the numbers in facultative bacteria. For example, in Methanobacterium thermoautotrophicum
ΔH (a lithoautotrophic thermophilic archaebacterium), 10 clusters are predicted to code for ABC transporters (69
). In Aquifex aeolicus
(an obligate chemolithotrophic eubacterium), 13 clusters similar to ABC transporter systems are present (22
). In contrast, E. coli
(a chemoorganoheterotrophic bacterium) has 51 clusters containing sequences similar to ABC transporters (out of 746 transport and binding gene products from which 382 are characterized; 15.8% of the total ORFs [66
]). Thermotoga maritima
(representing a new genus of unique extremely thermophilic eubacteria growing up to 90°C) has 31 ABC transporters (57
sp. strain NRC-1 has at least 27 members of the ABC transporter superfamily.
In N. europaea
ca. 14% (40 ORFs) of the active transport proteins are dedicated to Fe transport (see below). The 13 putative active transporters that include ATP-binding cassettes (ABC) represent approximately 75 ORFs or 3% of the total ORFs. These ABC transporters contain at least two of the typical three components for ABC transporters (nucleotide-binding domain, membrane-spanning domain, and solute-binding protein). ABC-type multidrug, protein, and/or lipid transporters are predominant. For instance, capsular polysaccharide export and polysaccharide or polyol phosphate-type transporters are present, as is an ABC-type sugar transferase system involved in lipopolysaccharide synthesis with a putative membrane cation efflux permease. This lipopolysaccharide transporter may confer to N. europaea
the ability to adhere to surfaces. Two ABC-type transporters for anions (e.g., nitrate) could also be inferred. The nitrate transport systems have similarity to the sulfonate and bicarbonate transporters (52
). A sulfate permease and an ABC-type sulfate/molybdenum transport system are present. N. europaea
has two ORFs coding for a potassium uptake system and another for a sodium-translocating NADH dehydrogenase. Two loci for heme-exporting proteins adjacent to cytochrome related genes are also present in the genome.
Systems for the uptake of organic molecules are few in N. europaea. ABC transporters for some single molecules are present (i.e., for glutamine and spermidine or putrescine). An amino acid permease and an amino acid transporter are present but are not linked with a nucleotide-binding domain or a membrane-spanning domain, which are characteristic of ABC transporters. Only one sugar transporter is apparent as a phosphotransferase system (PTS) with similarity to fructose or mannose transporters. Two additional PTSs are nitrogen related. The limited number of permease genes for organic molecules may contribute to the obligate nature of lithoautotrophy by N. europaea.
The genome also contains loci encoding a Hg scavenger-like transport system, which could be responsible for heavy metal tolerance. Furthermore, several candidate genes encoding divalent cation (Cd, Zn, and Co) transporters have been identified. Because of the high redundancy of encoding genes, the cases of iron and zinc transport are discussed below.
The subject of iron uptake is among the most interesting revelations of the genome sequence. Although genes for siderophore biosynthesis seem to be absent, the genome of N. europaea
contains 20 likely functional fecIR
gene tandems, most of which are directly preceded or succeeded by iron siderophore receptor-encoding genes (Fig. ). Among these are ORFs with significant similarity to known receptors for the uptake of ferrichrome-, pseudobactin-, and pyoverdin-like siderophores and other ferric iron siderophores, whose regulation seems to be linked to the FecI/FecR sigma factor/membrane sensor system of ferric-dicitrate iron accumulation (26
). Overall, the genome of N. europaea
appears to contain 22 genes encoding FecI-sigma factor-like proteins (Fig. ). Our phylogenetic analysis of putative FecI-FecR homologs revealed that an ancestral fecI
gene tandem has coevolved into two distinct subgroups of fecI
gene tandems by an early duplication event. While the fecIR
gene tandems have further coevolved by numerous gene duplication events within their subgroups, these subgroups did not evolve with any preference for the proximity of a particular iron siderophore-sensing receptor gene (see indices in Fig. ). Whereas all fecR
genes were found next to a fecI
gene, only two fecI
genes were found not in tandem with a fecR
gene. Because unnecessary redundancy is usually easily lost from prokaryotic genomes, we predict that the identified fecIR
gene tandems have physiological relevance for Nitrosomonas
' high need for effective iron acquisition. Additionally, a gene encoding an enterobactin-like siderophore receptor (NE1205) was found in close proximity to a substrate-binding protein (NE1206) and five ORFs that likely comprise an ABC type 2 transport uptake system (NE1207 to NE1211). The overall scenario suggests that N. europaea
may utilize siderophores produced by other organisms in its environmental consortium while under iron stress, and the possible citrate mechanism may serve as a “last resort” in the event iron is not available from any other source. The advantage of this mechanism would be the ability of the organism to supply its iron requirement without the costly secretion of reduced carbon. The fecIR
gene tandem NE1217-NE1218 (followed by a TonB-dependent outer membrane receptor) is adjacent to a copF
-like gene (NE1216) that may encode a Cu2+
cation transport ATPase. Copper ion transport is likely facilitated by the product of NE1019, which is highly similar to the CopA copper transport ATPase from Staphylococcus aureus
. To prevent copper toxicity while facilitating Cu utilization, N. europaea
likely expresses the three copper resistance proteins A, B, and D (NE0279, NE0280, and NE2058); a copper-binding protein, CopC (NE1491); and an inner membrane copper tolerance protein (NE2389).
FIG. 5. Phylogeny of FecI and FecR homologous proteins and associated iron siderophore receptors. Distance-neighbor-joining trees were derived from CLUSTALW alignments of FecI- and FecR-homologous protein sequences that were deduced from identified ORFs in the (more ...)
In addition to iron and copper sensors and transporters, the N. europaea genome contains three separate gene clusters putatively involved in divalent cation (cobalt, zinc and cadmium) transport (Fig. ). Cluster 1 (NE0346/5/4/3) is organized similarly to the czc gene cluster in R. metallidurans CH34, and genes in the second cluster (NE0373/4/5/6/7) are arranged like the czt gene cluster in P. fluorescens 13525, whereas the third cluster seems to lack the two-component regulatory system found in the other two clusters (Fig. ). The third czc gene cluster is in proximity to a gene coding for an Mg2+ transporter protein, MgtE (NE1633). Based on the similarity to genes encoding heavy metals efflux systems in Ralstonia and Pseudomonas spp., the czc and czt clusters may also encode heavy metal efflux systems in N. europaea. Because of its proximity to a putative Mg2+ uptake protein, the third czc cluster may be involved in metal uptake.
Oxidative stress. N. europaea
, like virtually all other aerobic organisms, is expected to contain enzymes that convert superoxide and hydroperoxides into innocuous products (21
). Abundant aerobic bacteria such as Bacillus subtilis
and P. aeruginosa
have elaborate and redundant complements of superoxide dismutases (SODs), hydroperoxidases (HP), and “iron management” enzymes, and their expression is regulated through complex regulatory networks (70
). The genome of N. europaea
contains genes that encode a monofunctional small-subunit catalase (HPII, katA
), a catalase-peroxidase (HPI, katG
), a thioredoxin-dependent peroxide reductase (alkyl HP, ahpC
), and an iron-containing SOD (Fe-SOD, sodB
). The HPI-encoding gene is preceded by a truncated copy of its N terminus, as found in the genome of Burkholderia fungorum
) LB400, also a member of the β-proteobacteria. Most of the HPs are heme-containing enzymes and genes coding for bacterioferritin (Bfr, bfr
) and bacterioferritin comigratory protein (Bcp) were identified. In contrast, genes encoding a thioredoxin reductase (NADH-peroxiredoxin reductase, AhpF), which is part of the thioredoxin redox couple in many bacteria, glutathione oxidoreductase (gorA
), and other isozymes of HP or SOD were not found in the genome.
The cytochrome c
peroxidase of N. europaea
is a diheme cytochrome. Homologous enzymes require the reduction of one heme before reaction with hydrogen peroxide. In contrast, the diferric form of the N. europaea
enzyme reacts with hydrogen peroxide, making it a relatively better scavenger at a higher cellular redox potential (2
). The cytochrome P460 in N. europaea
has a possible NO-scavenging role (10
). The gene encoding cytochrome P460 in N. europaea
was sequenced previously (8
) and is corroborated in the genome sequence.
To detect and defend themselves against oxidative damage, E. coli
cells sense their cytoplasmic redox state with the OxyR protein, whose expression is upregulated autogenously by hydrogen peroxide concentrations of 50 to 200 nM and which is a key regulator protein in many multigene stress defense networks in most bacteria (70
). Surprisingly, the genome of N. europaea
does not contain a gene similar to known oxyR
genes. Thus, the katG
genes, which are “normally” regulated by OxyR, are likely independently regulated in N. europaea
, as is the gene encoding the ferric uptake regulator protein (Fur, NE0616, fur
). The Fur protein, for instance, regulates ferric citrate (FecIR) and ferrichrome (fhu
operon) transport, exotoxin synthesis, and the expression of HPs (HPI and HPII) in several bacteria (75
). Genes encoding other regulators in the Fur protein family such as Zur, the zinc uptake regulator protein, were found in the genome of N. europaea
(Zur, NE1722, zur
). OxyR also controls the expression of the alternate sigma factor for stationary-phase-specific transcription, RpoS, in enterobacteria, whereas rpoS
gene expression in P. aeruginosa
is dependent on cell density (80
). Surprisingly, the genome of N. europaea
lacks an rpoS
-like gene entirely. On the other hand, genes encoding other alternative sigma factors such as RpoN (σ54
-NE0062), RpoH (σ32
-NE0584), and RpoE (σ24
-NE2331) were found in the genome.
Motility, cell division, and signaling. N. europaea
is motile and can form biofilms; hence, its genome should contain structural and regulatory genes necessary for the synthesis (and its regulation) of flagella, as well as for the correlation of flagellar synthesis and function with environmental cues and challenges. The complement of operons needed for flagellum biosynthesis is complete compared to available information from other bacteria (13
). However, the organization of these genes and the operon locations in the genome are remarkably different from those in other bacteria (25
). As far as is known, the flagellar master operon flhDC
(NE2407/2406) is required for the transcriptional initiation of flagellation and chemotaxis both through direct activation and/or derepression of operons and indirectly through control of the FliA protein (NE2491), an alternative sigma factor (σ28
). Five classes of methyl-accepting chemotaxis proteins (MCPs) are known in enterobacteria (13
), and the N. europaea
genome contains genes similar to members of three classes: (i) a Tsr-like protein (with HAMP domain; NE1864) that directly senses Ser, Ala, Gly, and aminoisobutyrate (tsr
); (ii) a Tar-like protein (with PAC, PAS, MA, and HAMP domains; NE1863) that senses Asp and Glu directly and maltose through a periplasmic binding protein and is responsive to Co and Ni (tar
); and (iii) a Tap-like protein (with MA domain; NE1251) that senses dipeptides through a periplasmic binding protein (tap
). Additionally, a gene encoding a pseudomonad PilJ-like MCP (with the MA domain; NE1251) was identified. It is not yet known if these N. europaea
genes are responding to these same attractants. Genes similar to ones known to encode the ribose/glucose/galactose sensor Trg and the redox sensor Aer in enterobacteria were not found. In comparison to other genomes, best conserved is the cluster with the genes encoding the proteins CheA, CheW, and MCP-Tsr (NE1866, NE1865, and NE1864), followed by genes encoding MCP-Tar, CheR, and CheB (NE1863, NE1861, and NE1859), a sequence of chemotaxis genes that is found in the E. coli
genome. The other two MCP-encoding genes are adjacent to genes encoding additional CheW proteins (NE1250/51 and NE1396/97). In comparison to the P. aeruginosa
genome that encodes more than 20 MCP-like proteins, chemotactic responsiveness of N. europaea
mediated through only 4 MCP-like proteins are rather limited. Remarkable also are the fairly distant locations of the operons that contain fliA
(NE1923 and NE1924), and cheA
(NE1866), which are usually in close proximity to or members of the same operon (25
). The N. europaea
genome also lacks loci encoding the FlgM/FliT proteins that regulate FliA availability in enterobacteria through anti-sigma activity (51
). Given the slower growth rates, such a complex regulation of flagellation would likely drain more energy from the tight budget than it could save. Because N. europaea
seems able to respond to the autoinducer N
-acylhomoserine lactone (AHL) and of forming biofilms (4
), some sort of integration of these signal cascades can be expected. The flagellar master regulon flhDC
is also involved in the regulation of virulence factor synthesis and cell division and seems to be a good candidate for functional compensation of the missing stationary-phase regulation via RpoS. Like its β-proteobacterial relative Neisseria
, N. europaea
contains only the essential suite of cell division proteins (ftsI
, NE0985 to NE0994; ftsQAZ
, NE0995 to NE0997; ftsK
, NE1051; and minCDE
, NE1831 to NE1829), and it lacks genes for SulA, ZipA, FtsL, and FtsN proteins that are found in γ-proteobacteria (53
). Surprisingly, it lacks the functional two-component regulatory systems of LasRI/RhlR, which play key roles in connecting quorum sensing, motility, stationary-phase response, and the synthesis of virulence and stress tolerance factors in many environmental bacteria such as pseudomonads (81
). Nevertheless, since the synthesis of autoinducer branches off fatty acid biosynthesis pathways and genes encoding the FAB pathway have been identified in N. europaea
, we conclude that NE1184 encodes a putative AHL synthase similar to that found in P. fluorescens
). In P. fluorescens
, this autoinducer synthase produces both short- and long-chain AHLs (three identified). Although N
-homoserine lactone acted as a signal molecule in cell density regulated recovery from starvation in N. europaea
), the actual sensor molecule(s) produced have not been biochemically characterized.