General genome features
The single circular chromosome of strain CGSP14 contains 2,209,198 bp with a G + C content of 39.5% (Figure ). The sequence of the genome has been deposited in the GenBank database (accession no.
CP001033). Base pair one of this chromosome was assigned within the putative origin of replication. The genome has 58 tRNAs, 12 rRNAs and 3 structural RNAs, including 4 rRNA operons. Biological roles were assigned to 67% of the 2,206 predicted protein-coding sequences (CDSs), according to the classification scheme adapted from Riley [
18]. Seventy-nine percent of the coding sequences were transcribed in the same orientation as DNA replication, a feature that appears to be common in other low GC Gram-positive bacteria [
15]. The replication termination site is localized near 1.1 megabase pairs by GC skew analysis. This region is located almost exactly opposite the origin of replication on the chromosome (Figure ). The genome includes 65 pseudogenes, the majority of which are IS elements and hypothetical proteins.
Previous studies showed that the
S. pneumoniae genome is rich in IS elements, which make up more of the genome than of any other bacterial genomes sequenced to date [
13,
15]. In the CGSP14 genome we identified 80 IS elements (Additional file
1). The majority of the IS elements appeared to be degenerate due to insertions, deletions, or point mutations, and only twelve were intact in CGSP14 genome. Although these degenerate IS elements might be inactive and non-functional, they could provide the potential sites for homologous recombination to acquire novel genes from related species.
Comparative genomic analysis
Comparative analysis of CGSP14 genome with four complete genomes and twelve draft pneumococcal genomes (Table ) provided new insights into the rapid evolution of the pneumococcal genome.
S. pneumoniae, as with other bacterial pathogens, possesses a conserved core genome with interspersing regions of small and large scale differences (Figure ). In total, 1,619 orthologous genes were shared by the seventeen pneumococcal genomes (Additional file
2). By searching the orthologous genes against COG database, 36% were found to be metabolism-related, 36% were associated with other known functions, 16% had poorly characterized functions, and 12% had no hits in the COG database and encoded mainly hypothetical proteins (Additional file
3). In addition, we found CGSP14 shared the largest number of orthologous genes (2055 and 2049 respectively) with strains Spn23F and SPnINV200 among the sixteen strains used for comparative genomic analysis, indicating that the CGSP14 genome shows highest homology to the two sequenced strains.
| Table 1Sixteen published genomes of S. pneumoniae used for comparative genomic analysis |
The genes on the distributed genomes were further analyzed. Alignment analysis revealed that at least eight distributed clusters were present in CGSP14 genome; most of the genes were related to virulence or antimicrobial resistance. These include a lantibiotic synthesis gene cluster, the capsular locus, a large cell wall surface anchor protein, two transposons, a resistance island, a possible phage remnant and a gene cluster with unknown functions. Meanwhile, we displayed the genome-wide GC content in Figure . All the eight clusters had deviated GC content, suggesting they could be recent acquisitions through horizontal gene transfer (HGT) in CGSP14. The distribution of the eight gene clusters among the seventeen pneumococcal genomes was shown in Table . Particularly, the two conjugative transposons were found to be unique in the CGSP14 genome.
| Table 2Distribution of the eight gene clusters among S. pneumoniae and other species of Streptococcus |
Alignment analysis indicated that chromosomal rearrangements occurred in
S. pneumoniae (Figure ). Compared with other published pneumococcal genomes, chromosomal inversions were identified in CGSP14 genome. A 189-kb inversion occurred across the replication termination site (from 1,010 kb to 1,199 kb). Chromosomal inversion across the replication axis usually is believed to rebalance the unbalanced chromosomal architecture caused by the insertion of large DNA segments [
19]. In the CGSP14 genome, we found that most of the acquired-DNA segments (totally 80.5 kb), mainly composed of transposons and IS elements, resided in left of the replication axis (Figure ). These observations suggested that the integration of transposons and IS elements affected the balance of the chromosomal architecture. This imbalance might cause the chromosomal inversion in CGSP14. This inversion led to transfer of 25 genes from the left to the right of the replication axis. Besides, a 19-kb inversion (from 832 kb to 851 kb) was observed in CGSP14 relative to TIGR4, G54, and INV200, while the gene order in this 19-kb segment is consistent to INV200. The gene cluster is not intact in other pneumococcal genomes. Further analysis showed that the four rearrangement breakpoints were located within the IS elements. Through chromosomal rearrangements,
S. pneumoniae evolved to maintain genome stability after HGT that might confer genes necessary for the organism to survive or replicate in its environmental niche.
Antimicrobial Resistance genes
CGSP14 is resistant to a variety of antimicrobial agents. The antimicrobial resistance determinants among the seventeen penumococcal strains were compared and listed in Additional file
4. CGSP14 contained 18 antimicrobial resistance determinants, while the number of antimicrobial resistance determinants in other strains varied from 9 to 12. Nearly half of the antimicrobial resistance determinants in CGSP14 were associated with mobile genetic elements.
The genome contained two large conjugative transoposons, which were found as composite elements of the known transposons. The first one containing 69 open reading frames (ORFs), was a 68-kb conjugative transposon (Figure ). Since this transposon had never been described previously, we named it Tn
2008, a novel conjugative transposon. Sequence analysis indicated that Tn
2008 was a composite of three transposons. A 50-kb DNA segment carrying chloramphenicol resistance gene (
cat) could be an independent conjugative transposon and at left terminus of this transposon, two ORFs were designated as intergrase and relaxase required for transposition. The sequences of the ORFs within this transposon were highly homologous to those of Tn
5252, which have been reported in
S. pneumoniae before [
20,
21]. The Tn
5252-like transposon was split into a 46-kb proximal region and a 4-kb distal region after the insertion of a 13-kb segment. The insertion appeared in the same position in Spn23F, which contained a 81-kb conjugative transposon [
12]. The 13-kb insertion was identified as another independent transposon, which also owned the intergrase and excisionase at the right terminus for independent transposition; this transposon carried 3 genes coding for erythromycin, streptothricin and kanamycin resistance, a feature similar to the known Tn
1545 [
22]. Another 5-kb segment, as an insertion in the 13-kb transposon, resembled the transposon Tn
917 [
23], which contained 3 ORFs, encoding erythromycin resistance protein (
ermB), resolvase and transposase. Overall, this novel conjugative transposon, a composite of three transposons, carried 5 antimicrobial resistance genes.
The other 23-kb conjugative transposon in CGSP14 contained 23 ORFs (Figure ). Sequence analysis demonstrated that this transposon was also a composite of two transposons. An 18-kb DNA segment carrying a tetracycline resistance gene (
tetM) could be an independent transposon, which contained two ORFs encoding for intergrase and excisase; this transposon shows high similarity to the transposon Tn
916 (Figure ) [
24]. Another 5-kb segment carrying an erythromycin resistance gene (
ermB), as an insertion, was identified as Tn
917-like transposon. The structure of this composite transposon, i.e., a Tn
917-like transposon inserted by a Tn
916 transposon, resembled that of Tn
3872. The structure of Tn
3872 has been described in
S. pneumoniae [
25]; thus, this 23-kb conjugative transposon was defined as a Tn
3872-like transposon.
Among the seventeen published
S. pneumoniae genomes, an 81-kb conjugative transposon and a 67-kb conjugative transposon also appeared in the genomes of Spn23F and G54, respectively. The two conjugative transposons were both composed of a Tn
916-like transposon and a Tn
5252-like transposon [
12,
16], similar to Tn
2008 in CGSP14. However, comparative analysis suggested that genetic variations occured among the three conjugative transposons (Figure ). In CGSP14 and Spn23F, the Tn
5252-like transposons carry a chloramphenicol resistance gene, which seems missing in G54, and is replaced by an ABC-type antimicrobial peptide transport system. The Tn
916-like transposon in Spn23F carries a tetracycline resistance gene, and in G54, it carries a tetracycline resistance gene and an erythromycin resistance gene. In contrast, the Tn
916-like element of Tn
2008 in CGSP14 lost the locus encoding the tetracycline resistance gene, while a DNA segment encoding three antimicrobial resistance genes, a transcriptional repressor and a Tn
917-like transposon was inserted into this position. The variation of antimicrobial resistance determinants in the three conjugative transposons showed that the conjugative transposons have experienced frequent recombination and deletion events after the Tn
916-like element integrated into the larger conjugative transposon, probably due to different selective pressures.
In addition to the two conjugative transposons carrying antimicrobial resistance genes, we identified a 14.4-kb genomic region (Figure ), which appeared to be a resistance island in CGSP14. The 14.4-kb region carried a chloramphenicol resistance gene (cat) and a gene encoding methionyl-tRNA synthetase 2 (metS2). Further analysis showed that the island shared an average G+C content of 33.6%, much lower than the average of the genome (39.5%). This island contained several genes associated with genome instability, including one site-specific recombinase and multiple IS elements which might be responsible for the lateral transfer of the genomic region. Furthermore, the associated ORFs had diverse phylogenetic origin (data not shown). Based on these features, we deemed the 14.4-kb region as a resistance island; to our knowledge, this was for the first time described in S. pneumoniae. This 14.4-kb resistance island was also seen in the draft genomes of CGSSp14BS69, CGSSp19BS75, CGSSp9BS68 and SPnINV200. BLAST results showed that the sequences in this island showed high identity to each other. The comparison between CGSP14 and SPnINV200 was demonstrated in Figure . Since this island carried two antimicrobial resistance genes, the presence of this resistance island may be associated with the increased multidrug resistance of these strains.
Distinct from the characterized antimicrobial resistance determinants associated with mobile genetic elements in CGSP14, there were several chromosome-encoded determinants that also contributed to antimicrobial resistance. These included a tellurite resistance protein (tehB), a bacitracin resistance protein (bacA), a cadmium resistance transporter (cadD), a multidrug resistance efflux pump (mdtG), two β-lactam resistance factors (femAB), and three metallo-β-lactamases.
Virulence genes
The polysaccharide capsule is the principal pneumococcal virulence determinant.
S. pneumoniae are divided into 91 serotypes depending on different capsular structures. Studies suggested that certain serotypes have a greater potential to cause invasive disease than others [
27,
28]. The clinical isolates of
S. pneumoniae in Asia are largely confined to a limited number of serotypes, namely 6B, 9V, 14, 19F, and 23F [
29]. In CGSP14 genome, a 19.4-kb gene cluster (SPCG0345 to SPCG0363) was identified to be involved in the synthesis of the capsular polysaccharide, flanked by two IS elements on each side, either truncated or disrupted, which were remnants of IS
1202 and IS
1167, respectively. Compared to strain 34359 of serotype 14 and strain SPnINV200 for which the capsular locus was determined [
30], the capsular locus of CGSP14 differed at 3' end (Additional file
5). The gene
wciY was divided into two
orfs (SPCG0358 and SPCG0359) in CGSP14. This gene was unique in serotype 14, but its function was unknown. However, a previous study showed that the disruption of this gene did not affect capsular production [
31]. Besides, the
orf (SPCG0360) immediately downstream of these two genes in CGSP14 was found to contain a deletion of 5 units of a 306-bp tandem repeat, compared with the corresponding genes in strain 34359 and SPnINV200. The gene belonged to the surface anchored protein family but its function also remained unclear. With the exception of these two genes, other genes in the capsular locus were almost identical among the three strains of serotype 14 [
30]. As has been described [
30], serotype 14 utilized the Wzx/Wzy-dependent pathway to synthesize their capsular polysaccharide (Additional file
6).
In addition to the capsule,
S. pneumoniae produced a number of other virulence factors, such as pneumolysin, hydrogen peroxide and cell surface proteins [
32]. According to how they are linked to the cell surface, the surface proteins of
S. pneumoniae are divided into three families: choline-binding proteins, LPXTG-anchored proteins, and lipoproteins [
32]. Surface proteins of CGSP14 based on computer prediction are shown in Additional file
7.
Several members of the choline-binding protein family are known to be important for virulence, including the autolysin (
lytA), choline binding protein A (
pspC), and pneumococcal surface protein A (
pspA). PspC is involved in the adhesion of bacteria to the nasopharynx [
33]. PspA is a highly variable protein and involved in inhibition of complement activation [
32]. Choline binding protein PcpA is postulated to be an adhesin because it contains leucine-rich repeats [
34]. The seventeen penumococcal genomes all harbored one copy of these virulence determinants, while CGSP14 and SP19-BS75 both obtained another copy of
pspA and
pcpA due to a 7-kb-long DNA insertion adjacent to a remnant transposase. The 7 kb sequences in CGSP14 and SP19-BS75 showed high identity to each other.
Proteins that contain the LPXTG amino acid motif are common in most Gram-positive bacteria. The LPXTG motif near to the carboxyl terminal of the protein is recognized and linked to the cell wall by a sortase enzyme [
32]. Neuraminidase is one of the LPXTG-anchored proteins. Neuraminidase cleaves N-acetylneuraminic acid from oligosaccharides, glycoproteins, glycolipids and is viewed as a virulence factor in microbial pathogenesis [
32]. Analysis of the available genome sequences of
S. pneumoniae indicated that this microorganism had at least three neuraminidases [
13-
15]. All the three neuraminidases are present in CGSP14. Both
nanA and
nanB are present in all the other sixteen penumococcal strains, while
nanC is present only in eight. The presence of
nanC might be associated with the increased virulence of some strains of
S. pneumoniae. Zinc metalloprotease is also a member of LPXTG-anchored protein family. From the published genome sequences of
S. pneumoniae, four zinc metalloproteases were discovered. CGSP14 contained three of them, including
iga,
zmpB and
zmpD. Zinc metalloproteinases belong to a group of hypervariable surface proteins, the hypervariability of these proteins are due to frequent HGT in these regions, enabling antigenic escape [
35].
Besides these common virulence proteins, one unusual protein in LPXTG-anchored proteins family was found. The gene, SPCG1750, encoded a 4695-animo acid protein, containing 528 imperfect repeats of the amino acid motif SASASAST. This surface protein shows homology to SP1772 (4776-animo acid) in TIGR4. The surface protein is located in the vicinity of nine glycosyl transferases in CGSP14, all of which are present on a 40.5-kb segment flanked by two IS elements. The 40.5-kb region seems to be an insertion in CGSP14 and TIGR4 due to HGT, as this region has not been found in other genomes.
Lantibiotics are peptide antibiotics with high antimicrobial activity against several Gram-positive bacteria. They are ribosomally synthesized and posttranslationally modified [
36]. In CGSP14, we identified a 5.4-kb locus encoding three proteins related to lantibiotic biosynthesis: a lantibiotic dehydratase, a lantibiotic synthetase and a lantibiotic efflux protein, nearby a transcriptional regulator (Figure ). By a thorough search against other sixteen genomes, this gene cluster shows high similarity to the corresponding locus in the genome of another serotype 14 strain SPnINV200. Recent studies reported that the strains SP23-BS72 and Spn23F of serotype 23 also contained lantibiotic synthesis gene clusters [
12,
37]; however, comparative analysis indicated that they showed no sequence similarity to those found in CGSP14 and SPnINV200. Furthermore, we performed a BLAST search against the nr database, and found the locus in the serotype14 has a high similarity (70%–88% identity) to that in
Streptococcus thermophilus. Therefore, this locus in the serotype 14 might encode a new type of lantibiotic, different from those found in the serotype 23. This finding suggests that communication of virulence genes has occurred among different species of
Streptococci.
What drives the genome evolution?
In this study, we further analyzed 20 clinical isolates of
S. pneumoniae serotype 14, all from sterile sites, by multilocus sequence typing (MLST), in addition to CGSP14, which belonged to ST15. The most common sequence type was ST876 (7 isolates), followed by ST13 (5) and ST46 (5) (Table ). Only one ST15 was identified among the 20 isolates. ST876 and ST46 were prevalent in Taiwan, while both ST13 and ST15 belonged to variants of the international England
14 clone (ST9). Capsular switching could have occurred between serotypes 14 and 3 and between serotype 14 and serogroup 9, as we found two isolates (Bsp097 and Bsp098) were ST1569. ST1569 has been identified in serotypes 3, 9A, and 9V. All the clinical serotype 14 isolates, expressed different levels of penicillin and ceftriaxone nonsusceptibility (Table ), which is in accord with the data published recently [
38]. This finding indicates that higher competence and plasticity of the genome likely afforded an advantage to pneumococcal strains to become more and more antimicrobial-resistant, and again supports that virulent clones might evolve to be more resistant in order to survive in the drug environment. Given the fact, to reduce the selective pressure, judicious use of antibiotics should never be overemphasized.
| Table 3Multilocus sequence typing and antimicrobial susceptibility of S. pneumoniae serotype 14 isolates |