The genomic heterogeneity within a bacterial species reflects its lifestyle, the niche it occupies, and its exposure to mobile elements, such as bacteriophages and plasmids [
1]. Even though organisms belonging to the same genus/species have a common gene set (the core genome), individual organisms differ (strain-specific genes) in ways representative of the physiological and virulence properties of an organism [
2,
3]. Although not all genetic differences between strains are important for niche adaptation of the bacteria, strain-specific genes are thought to be responsible for the survival of an organism in its chosen niche. This variation can be due to genetic noise (i.e., indels, mobile- and selfish DNA) [
4,
5], gene loss [
6,
7], gene duplication [
8] or modification of some of the existing genes [
9,
10]. Acquisition of new genes by lateral gene transfer is a predominant force in bacterial evolution. Laterally acquired genes provide a readily available novel pool of genes for developing physiological properties that are helpful for exploiting a new niche. A recent study suggested that the total known genome content (the pan-genome) of all contemporary
Streptococcus agalactiae strains will increases as hundreds of genomes are sequenced [
11]. Although
S. pyogenes belongs to the same genus, it has a smaller pan-genome and greater level of recombination in its core genome [
12]. These organisms provide a good model for identifying the causes of genome plasticity in human pathogens.
It has long been recognized that serological, genetic, and biochemical variations exist within the species
S. mutans [
13].
S. mutans has been classified into four serotypes (
c,
e,
f, and
k) based on the chemical composition of its cell surface rhamnose-glucose polymers [
14]. We previously developed a multilocus sequence typing (MLST) method using eight house-keeping genes. Ninety-two sequence types (STs) were identified from 102 clinical isolates, indicating that
S. mutans is a diverse population [
15]. In the MLST analysis, serotype
c strains were widely distributed in the dendrogram, while serotype
e, f, and
k strains were differentiated into clonal complexes. This suggests that serotype
c, the dominant serotype among
S. mutans clinical isolates (almost 80%), is the ancestral phenotype of this organism and that serotype
e and
f strains have evolved strain-specific genes. Although differences in modification of cell surface polymers reflect evolutionary trends, differences in cariogenicity have not been observed, and the relationship between serotype and clinical condition remains unclear.
Studies of individual
S. mutans genes have revealed sequence variations, resulting in altered function of the encoded proteins [
16-
18]. For example, variation has been demonstrated in the occurrence of plasmids [
19,
20], and in mutacin operons [
21], serotype antigens [
22], competence [
23,
24], and the
msm, bgl, cel, and
gftBC loci [
25-
28]. Waterhouse and Russell recently showed a mosaic of loci such as the
msm, gbl, cel, and
gftBC, which they called "dispensable genes," distributed among
S. mutans strains [
27]. They also demonstrated that 20% of the
S. mutans UA159 open reading frames (ORFs) were absent from one or more of the nine test strains, and dispensable ORF blocks (including more than one ORF) were identified by microarray analysis based on the UA159 genome [
28]. Given the wide distribution and diversities of genotypes and genetic loci in
S. mutans, it seems likely that other strains of
S. mutans have both unique and common genetic loci not present on the UA159 genome [
28,
29]. This is useful for charting
S. mutans evolutionary history. However, these analyses are based on only one genome,
S. mutans UA159 even though extensive genomic variation between
S. mutans strains has been predicted [
30].
Genome sequence data are now available for numerous species of bacteria and comparative evolutionary approaches show positive selection pressure and lateral gene transfer in the evolution of many bacterial species. These analyses have been performed for pathogenic bacteria such as
Helicobacter pylori [
31,
32],
Mycobacterium species [
33],
Chlamydia species [
34],
Escherichia coli [
35], and
Salmonella species [
36]. The pathogenic
Streptococcus species include important human and agricultural pathogens [
12,
37]. More than 30 whole genomes of
Streptococcus sp. belonging to nine different species including
S. pyogenes, S. pneumoniae, S. agalactiae, S. thermophilus, S. suis, S. sanguinis, S. gordonii, S. equi, and
S. mutans are publicly available. These organisms colonize diverse habitats including tooth, oral mucosal, pharyngeal, respiratory, intestinal, and urinogenital surfaces. These species have acquired various genes for a specific niche mainly by lateral gene transfer. For example,
S. pyogenes acquires or tolerates bacteriophages that are important for new virulence determinants and that induce genomic rearrangement [
38].
S. agalactiae, the main cause of neonatal infection in humans, also tolerates bacteriophages [
11]. Some of these organisms gain counterattack systems such as restriction modification or clustered regularly interspaced short palindromic repeats (CRISPRs) [
39,
40]. Multiple sequences of genomes from closely related species that inhabit different niches lead not only to an understanding of the pattern of gene movement but also to insights into the role of species-specific genes, and genome plasticity.
In this context, we determined the whole genome sequence of an S. mutans serotype c strain NN2025 isolated from Japan in 2002, and we compared the genome sequence, genome structure, and gene variation with the genomes of serotype c UA159 strain isolated in 1982 from the United States, and with 95 clinical isolates from Japan and Finland, and other closely related streptococcal genera to provide useful information about the evolutionary events associated with S. mutans strains and Streptococcus sp., and to provide new insights into streptococcal species-specific survival strategies.