The
Synechocystis PCC6803 genome contains 3164
annotated coding sequences, of which 2896 genes are at least 100
codons long. Of these, the predicted expression levels based on
E(
g) range from 0.56 to 1.51, with 380 ORFs (13%)
qualifying as PHX by Definition I. Among 22 complete prokaryotic genomes
the fraction of PHX genes ranges from 4% in
Bacillus subtilis to
17% in
Archaeoglobus fulgidus (
4). The 47 PHX genes with E(
g) ≥ 1.30 (referred to as the ‘top
PHX genes’) are reported in Table . Of
the top PHX genes, 17 encode proteins that function in photosynthesis
or respiration. Genes encoding major chaperone/degradation
polypeptides are PHX, such as the two
groEL genes,
dnaK and
clpC and two
ftsH genes
(cell division metallopeptidases). Several genes involved with translation/transcription
processing, including
rpoB,
fus,
tufA and
infB, are among the top
PHX genes. Most RP genes are PHX, but only the gene encoding S2
has E(
g) ≥ 1.30. Several prominent
PHX genes function in amino acid biosynthesis.
Functional categories of PHX and PA genes
Functional gene designations were adopted from the Kazusa Research
Institute Web site (
http://www.kazusa.or.jp/cyano). Although
some protein functional assignments are uncertain (especially multi-functional
proteins), these classifications have provided a useful reference
for the PHX genes.
Ribosomal proteins. Most RP genes of ≥100
codons (28 of 36) are PHX and all have E(
g) ≥ 0.93 (marginally PHX). Interestingly, RP
genes in
Synechocystis do not reach top expression
levels, as occurs for most eubacterial genomes (
4).
In fact, of 17 complete eubacterial genomes analyzed only
Synechocystis and
Thermotoga maritima have no RP genes among the highest
10 PHX genes. The RP gene with the highest PHX value in
Synechocystis is
rpS2; this gene attains E(
g) = 1.30 (rank
43). The RP genes are rarely duplicated in bacterial genomes, but
Synechocystis contains two copies of the S1 RP. Both
of the
rpS1 genes encode proteins of reduced lengths
(328 and 305 amino acids) compared to the eubacterial S1 length, which
generally exceeds 500 amino acids (
4).
Translation/transcription processing factors
(Table 2). Factors primary
for translation/transcription typically carry the highest E(
g) values. For example, the elongation factor EF-G
attains the highest E(
g) value in the genomes of
Haemophilus influenzae,
B.subtilis,
Helicobacter pylori,
Rickettsia prowazekii,
Chlamydia trachomatis and
Pyrococcus
abyssi (
4). In
Synechocystis,
EF-G (
fus) is encoded by four similar genes of
E(
g) values between 1.43 and 0.68 (Tables and ).
| Table 2. Genes of major translation/transcription processing
factors ≥100 amino acids in length |
| Table 6. Duplicated genes annotated under the same gene name exhibiting disparate
predicted expression levels (difference ≥ 0.30) |
Chaperones (Table 3). This
group includes genes encoding the principal chaperones
(e.g. heat shock proteins
groEL,
dnaK,
grpE and
htpG), as well as proteins
involved in trafficking, secretion and protein degradation. Many
genes in this category are PHX and several are present in multiple
copies. For example, there are three homologs of
dnaK.
The homolog that registers the lowest E(
g) value,
sll0058, may be specifically involved in pilus biogenesis (D.Bhaya,
A.Takahashi and A.R.Grossman, unpublished results) and therefore
may not be required at as high a level as the more commonly used
chaperones. In contrast, both GroEL homologs are among the top PHX genes
(Table ). Of 17 complete eubacterial
genomes analyzed (
4)
Synechocystis,
Mycobacterium tuberculosis and
Vibrio
cholerae carry two copies of the
groEL genes,
but only
Synechocystis maintains both copies as
PHX.
| Table 3. Genes of the extended chaperone/degradation collection
(also including protein assembly and export) |
Aminoacyl tRNA synthetases and modification genes. Among genes
functioning in translation, RPs and translation processing factors
tend to be PHX while aminoacyl tRNA synthetases are generally not
PHX (except in
Escherichia coli; S.Karlin, J.Mrázek,
A.M.Campbell and A.D.Kaiser, submitted for publication). The genes
encoding
S-adenosylmethionine synthetase (
metX)
and aspartyl-tRNA synthetase (
aspS) are PHX in
Synechocystis, although it is not clear whether
or not to place
metX in this functional class since
S-adenosylmethionine participates in a number of
different cellular processes (
14).
Photosynthesis and respiration (Table 4). Most of the genes encoding proteins that participate
in photosynthesis and respiration are PHX. The macromolecular complexes
essential for photosynthesis are PSI and PSII, the light-harvesting complex
or PBS, the cytochrome b6f complex and ATP synthase.
Almost all components of these complexes are PHX or marginally PHX [E(g) ≥ 0.95] and
are often among the PHX genes with the highest E(g)
values (Tables and ).
| Table 4. Genes of major complexes acting in photosynthesis ≥70
amino acids in length |
Of the 23 genes directly involved in CO
2 fixation,
13 are PHX. Several of the genes in this class having the highest
E(
g) values also function in glycolysis; these
include fructose bisphosphate aldolase (
fda and
cbbA), glyceraldehyde 3-phosphate dehydrogenase
(
gap) and phosphoglycerate kinase (
pgk). Enigmatically,
another key enzyme functioning in both glycolysis and photosynthetic
CO
2 fixation, triosephosphate isomerase (
tpi,
slr0783), is not PHX [E(
g) = 0.77].
Between the two copies of
gap genes,
gap2 [E(
g) = 1.30] encodes a protein
that appears to function in both CO
2 fixation and glycolysis
and is expressed at high levels under diverse conditions, whereas expression
of
gap1 is hardly detected (
15),
even though it is PHX [E(
g) = 1.07].
Ribulose 1,5-bisphosphate carboxylase (Rubisco) catalyzes the carboxylation
of
d-ribulose 1,5-bisphosphate, which participates
in the primary reaction of CO
2 fixation. Genes encoding
the large and small subunits of RuBP carboxylase,
rbcL and
rbcS, are both PHX [E(
g) = 1.33 and
1.13].
Synechocystis has five genes encoding
homologs of the CO
2-concentrating mechanism protein CcmK.
CcmK may be involved in formation of the carboxysome, the site of
RuBP carboxylase sequestration. Two of the five genes are PHX [sll1028
and sll1029, both with E(
g) = 1.20],
while the other three are not PHX (slr1838, slr1839 and slr0436).
Interestingly, sll1028 (110 amino acids) and sll1029 (102 amino
acids), as well as sll1838 and sll1839, are tandemly arranged, whereas slr0436
may represent a fusion of two
ccmK homologs to generate
a single ORF of 296 amino acids.
Among the genes for electron carriers that function in photosynthesis
and/or respiration, the main PHX gene encodes plastocyanin
(PetE) [E(g) = 1.32].
The Synechocystis genome contains a single copy
of the petE gene (sll0199). In contrast, another
electron carrier essential in photosynthesis, ferredoxin, is encoded
by four homologous genes. Multiple ferredoxins may be important
for guiding electron flow to specific electron acceptors; electrons
from ferredoxin can be used to reduce NADP (and CO2),
nitrite or sulfate. Two of the genes encoding ferredoxin register
as PHX and, notably, the ferredoxin gene slr0150 is PA.
Cytochrome c oxidase, composed of the three subunits CtaC, CtaD
and CtaE, is a complex of the respiratory chain that catalyzes the
reduction of oxygen to water and generates an electrochemical potential
that can provide energy for numerous cellular processes. There are
two cytochrome c oxidase operons on the Synechocystis PCC6803
genome. The first operon at position 790–794 kb includes
all three cta genes, while the second operon at
position 1540–1542 kb contains ctaD and ctaE, with ctaC located at 1698
kb. The latter genes are not PHX.
Energy metabolism (apart from photosynthesis and
respiration). Seven glycolysis genes typical of most bacteria
are PHX. These genes also function in photosynthetic CO
2 fixation, which
apparently requires high level expression. None of the genes encoding
TCA cycle enzymes are PHX in
Synechocystis, which
contrasts with the finding that several TCA cycle genes from most
eubacteria are PHX (
4). However,
cyanobacteria maintain a truncated TCA cycle (
16,
17) and the major catabolic pathway for
supplying cells with energy and reductant for respiration, dark
growth and nitrogen fixation is the oxidative pentose phosphate
pathway (
18). Therefore, unlike
most bacteria, cyanobacteria may not require high expression levels of
TCA cycle enzymes and indeed these enzymes are not highly expressed.
Among the oxidative pentose phosphate pathway enzymes, 6-phosphogluconate
dehydrogenase (Gnd) and phosphogluconolactonase (DevB) are marginally
highly expressed [E(
g) = 1.00
and 0.96, respectively], whereas the first enzyme of the
pathway, glucose 6-phosphate dehydrogenase (Zwf), records a reduced
E(
g) of 0.77. The essential enzymes of the non-oxidative
part of the pentose phosphate pathway, transketolase and transaldolase,
are both PHX at the levels E(
g) = 1.31
and 1.21, respectively.
Replication and repair. In most bacteria and
in Synechocystis, genes functioning in DNA replication
are generally not PHX. The replication initiation protein DnaA is
marginally PHX [E(g) = 1.05] and
one of two genes encoding DNA gyrase subunit A (slr0417) is PHX [E(g) = 1.18]. As verified in many other
prokaryotic genomes, the genes encoding the single-stranded DNA-binding
proteins Ssb and RecA, which function in repair and replication,
qualify as PHX.
Regulatory proteins. Bacterial genes encoding
regulatory proteins are seldom PHX. In
Synechocystis PCC6803,
13 of 133 putative regulatory genes are PHX, but all have E(
g) < 1.12. Four
regulatory proteins satisfy Definition II as PA genes, including
sll0709 (
LlaI.2, required for the
LlaI
restriction system), sll1408 (regulatory protein PcrR), sll0776
(eukaryotic protein kinase PknA) and sll0797 (a regulatory component
of the sensory transduction system OmpR subfamily). Some of the
regulatory elements may be toxic to bacterial cells when expressed
above a threshold level. The PA character of a gene could result
in low levels of expression, which would diminish the potential
of the gene product to become toxic to the cell. With respect to
LlaI.2, restriction enzymes are often laterally transferred
among bacterial strains and may be recognized as PA in the recipient
bacterium (
9,
19).
Synechocystis possesses seven PHX genes associated
with sensory transduction, including sll1124, slr1805, slr1760, slr1909,
sll1291, slr2024 and slr0947. This is more than in any other complete
prokaryotic genome. These regulators may be needed at elevated levels
either because they control a regulon containing a large number
of genes and/or the proteins must be expressed over a broad
range of concentrations for more accurate control of downstream
regulatory events or they are multifunctional.
The distribution of PA and PHX genes in the genome
Figure shows the distribution of clusters
of PHX and PA genes on the
Synechocystis genome
identified by
r-scan statistics (
20).
There are no long intervals devoid of PHX or PA genes. A cluster
of PHX genes at positions 830–846 kb includes 15 PHX RP
genes, adenylate kinase (
adk) and two ORFs, slr1894 [155 amino
acids, E(
g) = 1.22] and slr1896 [129
amino acids, E(
g) = 1.06]. slr1894
bears weak sequence similarity to the Dps (DNA protection during
starvation) family of stress proteins. Another cluster of PHX genes
contains three ORFs, sll0480 [411 amino acids, E(
g) = 1.10],
sll0481 [154 amino acids, E(
g) = 1.07] and sll0482 [406
amino acids, E(
g) = 1.05]. sll0480
is probably an aminotransferase, whereas sll0481 and sll0482 exhibit
no recognizable sequence similarity to known proteins. All three are
transcribed in the same orientation, suggesting that they constitute
an operon. There is another cluster of three consecutive PHX ORFs,
slr1657 [274 amino acids, E(
g) = 1.05], slr1658 [198
amino acids, E(
g) = 1.12] and
slr1659 [112 amino acids, E(
g) = 1.07],
that is also likely to comprise an operon. Statistically significant
clusters of PHX genes also include the RuBP carboxylase operon (
rbcL–
rbcX–
rbcS).
PA genes form three major clusters and several smaller clusters. A
cluster covering positions 353–385 kb contains the genes
rfbF,
rfbE,
rfbU and
galE1, one transposase (slr1075) and 13 ORFs
of unknown function. The
rfb genes may be required for
lipopolysaccharide biosynthesis. Lipopolysaccharide biosynthesis
genes of bacteria are often alien, as noted for
E.coli (
21). Three ORFs in the cluster (slr1063,
slr1065 and slr1066) show weak similarity to glycosyl and galactosyl
transferases, which function in lipopolysaccharide biosynthesis, and
slr1616 is weakly similar to GalE.
A second PA cluster covers positions 1614–1639 kb.
This cluster contains seven transposases, six ORFs and four genes with
assigned function, namely KpsM, KpsT, SpsC and SpsA. KpsM and KpsT function
in polysialic acid transport and SpsC and SpsA function in lipopolysaccharide
biosynthesis. The third cluster at positions 3096–3113
kb features eight PA genes, including transposases and ORFs and
the magnesium/cobalt transport polypeptide sll0671.