The
S. pneumoniae single circular chromosome of 2,038,615 bp (40% G+C content) contains 2,043 predicted protein coding regions and 73 noncoding RNA genes, which include four rRNA operons. The genomic origin of replication has not been experimentally identified in
S. pneumoniae; however, based on the presence of clusters of DnaA boxes and other genomic features, we hypothesize that the
S. pneumoniae origin of DNA replication is upstream of
dnaA, which is gene spr0001 in our nomenclature system (
16).
As anticipated from the work of Iannelli and colleagues (
21), relative to strain D39, the encapsulated strain from which R6 was ultimately derived, we noted a 7,504-bp deletion within the ≈18-kbp region that encodes the capsule biosynthesis genes. This deletion results in the absence of seven complete genes as well as the 3′ end of
cps2A and the 5′ end of
cps2H.
Other than genes associated with capsule synthesis, the genes encoding several putative virulence functions are present in the R6 genome (Table ). These include the genes for previously described
S. pneumoniae surface proteins, secreted proteins and bacteriocins, and all previously reported two-component response regulator systems (13 potential histidine protein kinases and response regulator pairs plus an unpaired 14th response regulator) (
46).
| TABLE 1Genes found in R6-encoded proteins that have been studied for a role in S. pneumoniae virulence or as protective antigens to S. pneumoniaea |
Drug resistance via efflux pumps is an important contributor to virulence in Staphylococcus aureus and other gram-positive pathogens. Although S. pneumoniae contains 14 genes that are possible antibiotic efflux pumps (Table ), these efflux pump genes may not be significant contributors to S. pneumoniae virulence. Antibiotic extrusion is not as common a source of resistance in S. pneumoniae as it is in S. aureus (e.g., quinolones). S. pneumoniae is not intrinsically resistant to most classes of agents, and among the exceptions (e.g., aminoglycosides and quinolones) resistance is not the result of drug efflux pumps.
| TABLE 2Possible/probable drug efflux pumps encoded by S. pneumoniae R6 |
Surface proteins. Surface proteins are of special interest because of their potential role in virulence and their possible utility in vaccine development and also because of their potential accessibility to antimicrobial agents. The R6 genome includes single copies of the previously described virulence-associated genes, including four that encode proteins that are under study as vaccine candidates (PspA, PsaA, CbpA, and pneumolysin) (
5).
Based on sequence analyses, we predict that a large number of proteins either reside on the S. pneumoniae cell surface or are secreted from the cell. These proteins include 471 with predicted signal peptide sequences, 109 possessing lipoprotein lipid attachment sites, and 10 that are recognized by choline-binding domains, an unusual means of surface attachment found in S. pneumoniae (Fig. ). We could predict no function for ≈23% of these potential surface-located or secreted proteins, which likely play roles in pneumococcal cell surface biology.
In
S. aureus, a transpeptidase called sortase anchors exported proteins containing an LPXTG motif followed by a C-terminal hydrophobic domain and a charged tail to the cell wall peptidoglycan (
29,
41). Although sortases are predicted to be present in all gram-positive bacteria, previously no sortase ortholog had been identified in
S. pneumoniae, nor could we identify one using BLAST searches. Using a Smith and Waterman algorithm (
43), we determined that a single
S. pneumoniae R6 gene, spr1098, likely coded for a sortase. Furthermore, we identified 13 genes that coded for proteins containing the LPXTG motif and other sortase substrate features. Six of these proteins are known to be present on the cell surface, while seven are novel and currently categorized as hypothetical with no known function (Fig. ). Our observations about sortase in strain R6 conflict with a report by Pallen and colleagues, who identified four sortase-like protein genes and “many” potential sortase substrates in the genome of a virulent
S. pneumoniae strain with a type 4 capsule (
34).
Lipoteichoic acid (LTA) is another cell surface component that contributes to the bacterium's interaction with the human host. Biochemical studies have not detected the presence of
d-alanine, a key component of LTA in many organisms, in
S. pneumoniae LTA (
14). In conflict with the apparent absence of
d-alanine in
S. pneumoniae LTA, we found an apparently complete
dltABCD operon that is homologous to those responsible for the addition of
d-alanine to LTA in
Bacillus subtilis and in
Lactobacillus casei. Previously we suggested that this operon may be silent or defective or that these genes may be active under specific physiological conditions (
3). Gene expression studies (data not shown) revealed expression of mRNA from each of these genes under normal laboratory growth conditions. The precise role of the
dltABCD operon in the biology of
S. pneumoniae remains unknown, although inactivation of this operon in
S. mutans confers increased acid sensitivity (
4).
Competence. S. pneumoniae competence, i.e., its natural capacity to take up DNA, has been studied in detail and a number of competence-specific operons have been identified (
23)
. S. pneumoniae R6 contains all of the genes induced during competence as noted by Lee and Morrison (
23), including two identical copies of
comX (each adjacent to a ribosomal operon). Additional genes reported to be induced during competence, but whose role in this process remains unknown, are also present (
36,
40). These 49 putative competence genes are grouped in 30 apparent operons and are found mostly on the leading strands extending away from the putative origin of replication (as are ≈80% of all
S. pneumoniae genes).
As might be predicted as a consequence of the capacity of S. pneumoniae to take up DNA, its genome is littered with genes that are apparently derived from other bacteria. Horizontal gene transfer is clearest for those genes that have been found only in gram-negative bacterial genomes. There are 40 ORFs that are similar to genes in gram-negative bacteria and that have not been found in other gram-positive genome sequences. This is not surprising, because S. pneumoniae occupies the same niche in the human respiratory system as several gram-negative species. Additionally, at least 2% of S. pneumoniae genes are significantly truncated relative to orthologous genes characterized in other bacteria (Table ). Many of the deletions are at the 5′ ends of the ORFs, which suggests that the ORFs may be nonfunctional remnants of their parental genes. This incidence may also be a consequence of competence. Coding regions may be missing from these genes because only part of the ORFs were acquired during the assimilation of foreign DNA, or because the genes were not essential to the pneumococcus and mutations are of no consequence. Transporters are the most frequently truncated genes. Among that set are five ORFs that are similar to genes encoding drug efflux pumps.
| TABLE 3Truncated and/or fragmented genes in S. pneumoniae R6 |
Comparative genomics and metabolism. In most respects, the S. pneumoniae gene complement is very similar to that of the prototypic gram-positive bacterium B. subtilis. More than 53% of the S. pneumoniae genes have highly similar counterparts in the B. subtilis genome (Fig. ). Systems for cell division, DNA replication and repair, translation, cell wall biosynthesis, and some central catabolic and biosynthetic pathways are basically the same as in B. subtilis. Major cellular systems and features that are notably different include energy metabolism, transport, amino acid biosynthesis, transcription termination, intracellular proteases, and the presence of three large sets of S. pneumoniae-specific repetitive elements in the genome.
As is characteristic of the lactic acid bacteria,
S. pneumoniae is a nutritionally fastidious facultative anaerobe requiring a complex medium for growth. This bacterium obtains energy strictly via fermentation and is incapable of respiratory metabolism, either aerobically or anaerobically, as is true of all streptococcal species (
38). The only nutrients from which the streptococci can obtain sufficient energy to support growth and cell division are carbohydrates, which are oxidized to pyruvate via glycolysis (with the exception of a few species that can ferment arginine). We identified a large set of genes that encode enzymes necessary for transport of at least 12 different carbohydrates into the cell and for their subsequent conversion to an intermediate in glycolysis.
S. pneumoniae R6, as expected, encodes all genes necessary for the oxidation of carbohydrates to pyruvate via glycolysis and would be expected to reoxidize most, if not all, of the NADH produced by the reduction of pyruvic acid to lactic acid. R6 contains genes for the synthesis of phosphotransacetylase, acetokinase, and NADH oxidase, which would allow it to convert pyruvate to acetate with concomitant production of an additional ATP, and the reoxidation of NADH (38). Although fermentation is the least energy efficient of oxidative processes, S. pneumoniae did not maximize this energy production by exclusively using the phosphoenolpyruvate-dependent phosphotransferase system to import carbohydrates. Five sugar species are imported by using energetically less-efficient ABC transporters. We found no genes that might encode cation antiporters of sugars, although we identified several amino acid/cation symport systems (Fig. ). All genes necessary for synthesis of the major ATPase of lactic acid bacteria, the F0F1-ATPase, are present. This proton pump works at the expense of ATP, but it can also serve as an ATP synthase, as well as serving as the major regulator of intracellular pH among lactic acid bacteria. We did not find the genes required for a complete electron transport chain that might be associated with either aerobic or anaerobic respiration.
No lactic acid bacterial species encodes a complete tricarboxylic acid (TCA) cycle, and the
S. pneumoniae R6 genome contains none of the 18 genes comprising this aerobic oxidative pathway. In other organisms, including those without complete TCA cycles, some of the TCA enzymes also have roles in the synthesis of certain amino acid precursors. As a result,
S. pneumoniae R6 is incapable of synthesizing aspartate (and hence lysine, methionine, threonine, and isoleucine) from oxaloacetate, nor can it synthesize glutamate (and hence arginine) via α-ketoglutarate. A defined medium developed specifically for
S. pneumoniae contains those amino acids, so the incomplete biosynthetic pathways were expected. We were unable to identify complete pathways for the synthesis of glycine, histidine, and leucine, all of which are included in the
S. pneumoniae defined medium. Valine is also included in the
S. pneumoniae defined medium, so identification of an apparently complete pathway for valine biosynthesis was unexpected (
42).
The presence in
S. pneumoniae of an ortholog to the
Cercospora nicotianae pdx1 gene suggests
S. pneumoniae may have a pyridoxal biosynthetic pathway (
12). The biosynthesis pathways for other required cofactors (biotin, choline, pantothenate) are either incomplete or absent. Presence of partial pathways for the synthesis of many of these amino acids and cofactors in
S. pneumoniae is not surprising. In many cases these enzymes make possible the conversion of molecules imported into the cell into other necessary metabolic components. Glutamine is an example of this type of metabolic conversion. Although
S. pneumoniae cannot make the starting material for glutamate, α-ketoglutarate, it does encode the enzymes needed to utilize glutamine as a nitrogen source. In
S. mutans, glutamine has been shown to be a principal source of nitrogen (
8). There are 22 genes encoding the elements of 7 different ABC transporters predicted to transport glutamine. That represents 10% of the transport genes in
S. pneumoniae. The allocation of the
S. pneumoniae genome capacity to glutamine transport suggests that glutamine is also needed for more than its role as a component of proteins.
Hydrogen peroxide is produced by
S. pneumoniae through the action of pyruvate oxidase (SpxB) under conditions of aerobic growth. This may be a mechanism by which the pneumococcus inhibits the growth of other common pathogens of the human upper respiratory tract such as
Haemophilus influenzae, Moraxella catarrhalis, and
Neisseria meningitidis, which are infrequently cocultured with
S. pneumoniae from patient samples (
35). Unlike those gram-negative species,
S. pneumoniae has the capacity to resist oxidative stress caused by H
2O
2. In
Escherichia coli, oxidative stress induces the expression of a set of ≈30 proteins under the transcriptional control of OxyR (
6). Based on their similarity to OxyR, either spr0593 or spr0828, which are bacterial regulatory proteins of the LysR family, might possibly regulate the
S. pneumoniae enzymes synthesized in response to H
2O
2. These include superoxide dismutase, glutathione reductase, glutaredoxin, DNA-binding stress protein, and two thioredoxin reductases. Although there are also genes for several peroxidases that can be used to ameliorate oxidative stress,
S. pneumoniae does not encode catalase.
Bacterial energy-dependent intracellular proteases perform a variety of tasks, possibly including that of the proteasome, which degrades aberrant and nonfunctional proteins in eukaryotes and archaea (
10).
S. pneumoniae R6 possesses single copies of the genes encoding the ClpP and FtsH proteases, but it is notably deficient of the genes encoding HslV and the ubiquitous Lon protease. While some bacteria with relatively large genomes, such as
E. coli and
B. subtilis, encode all four of these energy-dependent proteases, most eubacteria encode only a subset (
10). The
S. pneumoniae energy-dependent protease gene set appears to be characteristic of the gram-positive genera
Enterococcus, Streptococcus, and
Staphylococcus, but not of the mycoplasmas.
Repetitive elements. DNA sequences from three classes of repetitive elements, BOX, RUP, and IS, comprise >3% of the
S. pneumoniae genome. These kinds of repetitive elements make up more of the
S. pneumoniae genome than of any other bacterial genome sequenced to date. Functions for some of these sequences are controversial. The BOX elements are predicted to form stable secondary structures that may serve as the binding site for a protein responsible for modulating the expression of downstream genes (
27). Insertion of heterologous DNA into the BOX element upstream of the
comA gene produces
S. pneumoniae incapable of competence (
27). Additionally, Weiser showed that insertion of a BOX element upstream of a locus apparently involved in phase variation increases expression of downstream genes encoding the opacity phenotype (
50). The 107-bp RUP elements, predicted to form stable secondary structures, are proposed to be active insertion elements transactivated by the transposase of IS
630-Spn
1 (
32).
Almost all BOX and RUP elements are entirely in intergenic spaces. We hypothesized that analysis of the locations of the BOX and RUP elements relative to the transcriptional orientation of the genes surrounding them might offer clues about their potential regulatory roles. In
S. pneumoniae between adjacent genes in the same transcriptional orientation, the boundaries of transcriptional units are often unclear; accordingly, promoters and transcription termination signals are difficult to identify. Between pairs of adjacent genes oriented 5′ end to 5′ end on opposite strands of the chromosome, or at least in the vicinity of those genes, there are likely to be pairs of transcriptional promoters. Likewise, there must be a transcription termination signal or signals between pairs of adjacent genes oriented 3′ end to 3′ end on opposite strands of the chromosome (factor-independent transcription termination signals have been identified in streptococci that can function bidirectionally [
44]). Almost 3 times as many BOX elements and 1.5-fold more RUP elements are located between the genes oriented 3′ to 3′ than between genes oriented 5′ to 5′, and the insertion of IS elements flanking some of these elements may artificially deflate those ratios (Table ). This suggests a role for the RUP and BOX elements in transcription termination. Another possible function is suggested by the fact that secondary structures that RUP and BOX elements would assume at the 3′ ends of mRNAs are more complex than those observed for other factor-independent transcription termination signals (
27,
32), and
S. pneumoniae does not encode rho factor. These elements might enhance gene expression by either stabilizing mRNAs or serving as binding sites for regulatory proteins.
| TABLE 4Transcriptional orientations of genes flanking BOX and RUP repetitive elements |
Previous analyses indicate numerous IS elements were present in the DNA of various strains of
S. pneumoniae. The genome of R6 contains at least 60 complete or partial copies of 10 different IS elements, representing the families IS
L3, IS
5, IS
630, IS
3, IS
30, and IS
605. We identified three novel IS elements. We did not find an IS
1202, which was previously identified in a progenitor of the R6 strain,
S. pneumoniae D39 (
31). Most of the copies of the IS elements appear to be only remnants, as only seven possess the expected full-length sequences of putative transposase genes. The remaining copies all contain frameshifts, stop codons, or both within the ORF, and many have substantial amino acid substitutions, suggesting that they are no longer active. It is possible that these inactive insertion elements still play an important role in the evolution of this genome. For example, they may provide regions of homology that are sites for homologous recombination in the acquisition of genes from related organisms carrying these same insertion sequences but different flanking genes (
7).
The capacity to identify all potential genes within this pathogen should greatly facilitate the identification of novel targets for antibiotic discovery as well as new candidates for vaccine development. This process will be significantly enhanced by the comparison of the
S. pneumoniae R6 sequence to that of pathogenic strains of
S. pneumoniae (
www.tigr.org and
http: //genome.microbio.uab.edu/strep/). These comparisons, in concert with genetic and gene expression studies, should catalyze expansion of
S. pneumoniae biology.