|Home | About | Journals | Submit | Contact Us | Français|
Genotypic analyses of Streptococcus mutans using fingerprinting methods depend on a few genetic loci being different but do not reveal the underlying genome-wide differences between strains. We used comparative genomic hybridization (CGH) with 70-mer oligonucleotide microarrays containing open reading frames (ORFs) from S. mutans UA159 to examine the genetic diversity of 44 isolates from nine children selected from a local study population in Eastern Iowa. Unique strains (clones) within each child initially identified by AP-PCR were confirmed by CGH. There was a wide range of variation in the hybridization patterns of the 1948 ORFs among test isolates examined. Between 87 and 237 ORFs failed to give a positive signal among individual isolates. A total of 323 of the UA159 ORFs were absent from one or more of the test strains. These 323 variable genes seemed to be distributed across the entire UA 159 genome and across all the predicted functional categories. This set of very close geographically and temporally collected S. mutans isolates had a degree of gene content variations as high as a global set of strains examined in a previous publication (Waterhouse et al., 2007). Comparing the frequency of these variable genes, the majority of which have unknown function, among strains of different origins (i.e. different caries status) could help determine their relevance in S. mutans cariogenicity.
Streptococcus mutans is a bacterial resident of the oral cavity and is considered to be the principal etiological agent of dental caries in humans (13, 23). As S. mutans can be isolated from individuals either with or without a history of caries (4, 25), differences in colonizing S. mutans may contribute to the eventual development and progression of caries among different individuals. One difference may be in the bacterial load. High levels of S. mutans have been associated with caries (7, 12, 18, 33); however, many people with high S. mutans load due to high level sucrose consumption do not develop caries (25, 29). Another important factor may be the characteristics of S. mutans. Certain S. mutans attributes that are believed to be important in caries development, including acidogenicity, aciduricity, biofilm-formation potential, and production of glucans, are known to vary among isolates of S. mutans and may determine the differences in caries experience (23, 24, 28, 30, 38). However, correlation of these specific factors with caries in human has been difficult to establish. Nevertheless, S. mutans diversity remains an important area of study as it provides one possible explanation for the contrasting caries status in people colonized with S. mutans. The challenge is that the amount of S. mutans diversity and its genetic basis are still not well understood.
Molecular fingerprinting techniques such as RFLP, MLEE, ribotyping, and AP-PCR have been used to demonstrate that S. mutans strains are genetically diverse. A large number of clonal types of S. mutans have been identified; these vary across both caries and healthy populations (6, 8, 10, 11, 20, 21, 22, 26, 34, 35, 39). In a study of S. mutans from over 200 individuals, no two individuals exhibited the same clonal type profile unless they were related to each other (5). Genetic fingerprinting methods also demonstrated that transmissions of S. mutans can occur from parents to children (39), and between children attending the same nursery school class (3). Using RFLP Kulkarni et al. (19) were the first to show that an individual could carry multiple genotypes. Multiple genotypes were isolated from children - parents pairs (27), and from young adults (31). Several studies have found that caries-active subjects had more genotypes than caries-free subjects (2,9 15,31) although others have found the opposite (17). The role of concurrent colonization of multiple genotypes in caries development is completely unknown. It has been suggested, however, that increased risk of caries in subjects with multiple genotypes could be the result of simultaneous action of strains with different cariogenic potential (2).
While the above observations were informative, current typing methods provide only limited insight into S. mutans genetic diversity. Typing methods survey only a single or a few genetic loci; hence the extent of the genetic differences across the whole genome in S. mutans clones is not known. Fingerprinting techniques based on variation in housekeeping genes or on random variations of anonymous loci within the genome are unable to determine whether S. mutans clones vary in their overt virulence and concomitant cariogenicity. By contrast, comparative genomic hybridization provides a genome-wide snapshot of differences in S. mutans gene content including those potentially important in virulence. The first comparative genomic hybridization (CGH) study of nine representative S. mutans strains revealed extensive heterogeneity in gene content among strains from different geographical locations around the world (42). Considering the genetic differences observed in isolates collected from geographically dispersed populations, we were interested to observe the level of heterogeneity in a single geographic site. CGH provided the opportunity to investigate how many genomic differences exist among multiple genotypes seen within a single individual and the degree of variability and difference among isolates from sites with and without caries. In this study, we used CGH to determine the genetic diversity among multiple S. mutans isolates obtained from nine children from a single local study population. Our objectives were to investigate the extent of genome-wide differences among S. mutans strains that were geographically and temporally related and to compare AP-PCR and CGH methods for genotyping of S. mutans strains.
Clinical S. mutans isolates were collected from nine children randomly selected from the 200 participants in a study of caries-inactive and caries-active preschool-age children enrolled in the Head Start program in Eastern Iowa (Table 1). S. mutans was isolated by spiral-plating original plaque samples in transport media and appropriate dilutions onto MSKB (Mitis-Salivarius agar + Kanamycin and Bacitracin, a selective medium for SM). Initial isolates from this selective/differential media were then processed for initial identification using a Quad-plate system for assessment of specific enzyme activities – fermentation of mannitol, sorbitol, raffinose, and salicin, and arginine decarboxylase activity. S. mutans identification was then finalized by PCR using species-specific sequences described by Oho (32). Up to 10 random colonies from each plaque sample were chosen for this sequential selective scheme. The final number of confirmed S. mutans isolates from each child ranged from 2 to 9. A total of 44 isolates from these 9 children were included in our analysis. S. mutans UA159 is the strain for which the genome has been sequenced (1). We obtained UA159 from ATCC and used it as the reference strain in our microarray experiments.
Arbitrary-priming polymerase chain reaction (AP-PCR) was used to obtain DNA fingerprints on all S. mutans isolates. AP-PCR was performed as described (22) with some modifications. Amplification was conducted in a thermocycler (PTC-200, MJC Research) programmed with the following temperature profile: initially 5 min at 94° C, followed by 45 cycles of 30 seconds at 94° C (denaturation), 30 seconds at 36° C (annealing), and 1 min at 72° C (elongation). The final elongation cycle was 10 min at 72° C. Amplification mixture (master mix) consisted of: 25.50μL of RNAase free H2O, 5μL of 10× amplification buffer, 1.0μL of dNTP (10 mM each), 14.0μL of MgCl2 (25μM), 1μL of 10μM Primers OPA-2 and OPA-13 in separate runs, and 0.50 μL of Taq Polymerase. A 2μL sample of purified S. mutans DNA (25ng/μL) was added to the master mix. Gel electrophoresis was carried out on the PCR samples. The gel was stained with ethidium bromide and visualized under UV light. Isolates were considered belonging to the same clonal type if both OPA-2 and OPA-13 fingerprints were identical.
Genomic DNA from S. mutans was isolated using GenElute Bacterial Genomic DNA kit (Sigma, Saint Louis) and sheared into 200 to 1000 bp fragments in a cup sonicator (Misonix Sonicator 3000 at setting of 4 for 20s). A total of 4 μg of sheared DNA per sample was fluorescently labeled with BioPrime Plus Array CGH indirect Genomic Labeling System (Invitrogen, CA) according manufacture’s protocol, using a modified 25× deoxynucleoside triphosphate mix consisting of 12.5 mM each of dAPT, dGTP, and cCTP; 2.1 mM dTTP; and 10.4 mM aminoallyl-dUTP. Fluorescence-labeled DNA was purified with Qiagen PCR purification columns. DNA and dye concentrations were determined using Beckman spectrophotometer.
S. mutans microarray slides were obtained through the NIDCR Oral Microbe Microarray Initiative, a collaborative effort between the NIDCR and the NIAID Pathogen Functional Genomics Resource Center (PFGRC) located at the J. Craig Venter Institute. Open reading frames (ORFs) from S. mutans UA159 genome were represented by unique 70-mer oligonucleotides on the genome slides. UA159 genome currently has 1960 identified ORFs, of which 1948 are represented on the S. mutans array.
For each comparative genomic hybridization, equal amounts of DNA (3 μg) from a test strain and the reference strain UA159, labeled with different fluorescence dyes, were combined. Combined DNA was concentrated using an YM-30 Microcon column (Millipore, MA) and mixed with hybridization solution to a final volume of 60 μl. Pre-hybridization, hybridization and post hybridization washes were carried out following the protocol provided by the Pathogen Functional Genomics Resource Center where genome slides were produced (http://pfgrc.jcvi.org/index.php/microarray/protocols.html) with one exception: the ArryIt HybIt hybridization solution from TeleChem International was used instead of suggested formamide-based buffer and hybridization was performed at 62°C. Test strains were hybridized twice with UA159 as the reference on two slides: once with the test strain Alexa Fluor 555 (Cy3 equivalent dye) labeled and UA159 Alexa Fluor 647 (Cy5 equivalent dye) labeled and once with the test strain Alexa Fluor 647 labeled and UA159 Alexa Fluor 555 labeled.
Arrays were scanned with a Chipreader laser scanner (BioRad) at 10-um resolution at 532 and 635 nm. Resulting images were processed using Spotfinder software from The Institute for Genomic Research (TIGR) to retrieve probe intensities with provided annotation file associated with the slide. Normalization (log mean centering) and filtering were done using TIGR Microarray Data Analysis System (MIDAS). The data quality and the normalization effects were assessed by examining plots of M versus A [M = log (reference/test); A = log (reference × test)/2]. The final viewing and clustering of processed data were performed using TIGR MultiExperiment Viewer (MeV). Dye swaps were treated as replicates after normalization. To derive a cut-off point for deletion detection, signal ratios on all ORFs from six UA159 self-self competitive hybridizations were examined. All spots with a Log2 intensity ratio (reference/reference) less than 2 or greater than −2 with majority concentrated around 0. The range of the signal ratio on reference-test hybridizations was also examined. A set of 50 ORFs representing the range of the signals ratio were selected and their presence in test strains were examined by PCR (see below). Based on these analyses, deleted ORFs were defined as having average Log2 intensity ratio (reference/test) greater than two. Over 200 additional PCR verifications were performed throughout this study and we found that this cut-off point gave low rates of <0.7% false negatives (i.e., ORFs tested positive by PCR but classified by the CGH as absent).
PCR amplifications of selected ORFs in test strains were used to validate microarray hybridization results or to resolve ambiguous hybridization results. Using UA159 genome sequence, primers internal to the target ORF (or flanking target ORF if target ORF is too small in size) were designed using Lasergenes (DNASTAR, Madison). PCR amplification was carried out in 25μl reaction comprising 1× PCR AccuPrime SupreMix II (Invitrogen), 1 uM of each primer, 50 ng DNA template in a PTC-200 DNA Engine thermal cycler (MJ Research). The thermocycling parameters for PCR were as follows: initial denaturation for 2 min at 94°C, 30 cycles of 30 s at 94°C, 30 s at annealing temperature (5 degree below the lower Tm of the two primers), and 1min at 72°C, and a final 10 min extension at 72°C. When a PCR failed to generate the expected product, the PCR of the 16S rRNA gene was performed to confirm the quantity and quality of the DNA templates. Those PCR negative for targeted ORF but positive for 16sRNA gene were considered truly negative for targeted ORFs.
For each of the nine children, multiple S. mutans isolates (2 to 9) were genotyped. DNA fingerprints obtained by AP-PCR using two primers indicated that multiple isolates within 7 out of 9 children were identical. For the remaining two children, both with high caries, two unique fingerprints were found among 5 and 7 isolates, respectively. Comparing AP-PCR fingerprints of isolates across children, a total of 11 unique AP-PCR clones could be established and each was assigned a clonal designation (Supplementary Figure 1 and Table 1).
AP-PCR genotyping results correlated well with the CGH genotypes based on hybridization patterns of 1948 ORF probes. One CGH clonal type was found in the 7 children with a single AP-PCR types, and two CGH clonal types were found in the two children with multiple AP-PCR types. CGH indicated that the two clonal types found in each of the two children were very different, with hybridization pattern differences for 97 and 115 ORFs, respectively. While the extent of differences among clones was difficult to access based on AP-PCR fingerprints generated by gel images, CGH provided a clear genome wide view of the similarity/difference among S. mutans clones from all children. The LAM07CT1 and SC106CT1 were most closely related clones as they differed by only 25 ORFs at 8 genomic loci. By contrast, LAMCT1 differed from DURCT1 by 184 ORFs and they were the most distant clones among all pairwise comparisons.
CGH also showed that minor differences may exist among multiple isolates of the same clone within a child. Among nine isolates from child number 56 designated as ARM11CT1 clone, CGH indicated there are differences in hybridization patterns for three ORFs: SMU.41, SMU.343, and SMU.1231c. Hybridization differences for two ORFs, SMU.343 and SMU.176, were also observed among four isolates designated as DUR08CT1. These differences were confirmed by PCR amplification using primer pairs specific to these loci (Supplementary Figure 2). Both ARMCT1 and DUR08CT1 were from children with high caries.
Among the 1948 ORFs within the UA159 genome represented on the microarray, 323 (16.6%) were identified as variable ORFs since they failed to give a positive signal in at least one test genome of the 11 unique strains (Supplemental table 1). These variable ORFs were either absent from the test strain or their sequence sufficiently divergent from that of UA159 to produce positive hybridization signals. The numbers of ORFs absent from individual strains varied significantly, ranging from 87 (4.5%) in clone AW107CT1 to 237 (12.0%) in clone DUR08CT1 (Table 1). The majority of variable ORFs failed to hybridize with more than one strain. Forty four (13.6%) of the variable ORFs were absent from only one strain and 18 (5.6%) were absent from all 11 strains.
The 323 identified variable ORFs were distributed along the UA159 genome (Figure 1). However, only 15 (4.6%) of them corresponded to stand alone deletion without another adjacent variable ORF. Most variable ORFs formed deletion blocks involving variable numbers of ORFs. Occasionally, absent individual variable ORFs or deletion blocks were separated by only few ORFs. When these variable ORFs separated by no more than two ORFs were consider to belong to the same blocks, the 323 variable ORFs were distributed among 50 genome loci. Loci with large numbers of variable ORFs tended to have a G + C composition atypical for the S. mutans genome (Figure 1). The two largest deletion blocks had 34 ORFs (SMU.191c–SMU.266c) and 38 ORFs (SMU.1329–SMU.1374), respectively. These two blocks corresponded to two genomic islands TnSmu1 and TnSmu2 identified previously (1). These islands were not necessarily present or absent in their entirety in individual strains. Regions corresponding to these genomic islands in individual strains varied in length and complement of ORFs.
Some of the genomic loci potentially associated with important S. mutans phenotypes varied among strains. SMU.100-SMU.116, a genomic island associated with carbohydrate metabolism, includes genes for a sorbose-phosphotransferase and a fructose-specific phosphotransferase systems. Four strains lost the genes for fructose and three lost the genes for both uptake systems. One strain also lost mannose-specific phosphotransferase genes SMU.1878 and SMU.1879. Several genomic loci associated with bacteriocin production also showed heterogeneity. At least one ORF in regions encoding Mutacin 1 (SMU.651-SMU.658) was lost in four strains. Half of the ORFs in region SMU.1803-SMU.1818 associated with scn bacteriocin was absent in seven strains. In the region associated with blp bacteriocins, only two strains have the same complement of ORFs SMU.1888-SMU.1917 as UA159. The other strains lost 1 to 15 of the ORFs in this region.
To investigate which functional groups variable ORFs belong to, we classified variable ORFs into clusters of orthologous (COGs) (36). Figure 2 shows the number of variable ORFs in each COG category. A large number of variable ORFs could not be classified and are of unknown function. A considerable number of the variable ORFs are involved in DNA replication, recombination, and repair, and in carbohydrate transport and metabolism. Relatively few variable ORFs were found to be involved in amino acid nucleotide transport and metabolism, cell wall/membrane biogenesis, translation, and coenzyme transport and metabolism.
Among 11 unique test strains, 6 were from children with high caries and 5 from children with no or low caries. The number of variable ORFs deleted in high caries strains ranged from 87 to 237, and 113 to 208 for low caries strains. There was no correlation between the numbers of variable ORFs and the DMFS scores of the children. Since a deletion block with multiple variable ORFs is likely the result of single genetic event at a specific genomic location, strains can also be compared by the number of altered loci instead of individual ORFs. 19 to 28 (mean 22) variable genome loci were found in low caries strains and 14 to 36 (mean 26) for the high caries strains.
No ORFs were found to be present in all six high caries strains and absent in all five low caries trains and vice versa. 20 ORFs were deleted in three to five low caries strains but only in zero to two high caries strains. For example, SMU.1896c, a hypothetical protein, was deleted in all five low caries strains but only in one high caries strain. Similarly, an even larger number of ORFs were deleted in high caries strains than in low caries strains.
Genetic fingerprinting of S. mutans isolates by AP-PCR and other typing methods indicates that S. mutans strains overall are very genetically heterogeneous. While one completely annotated genome sequence of S. mutans has been available since 2002, much remains to be discovered in our understanding of the genetic basis of S. mutans diversity. We used CGH to determine the genetic variation among local S. mutans strains that were geographically and temporally closely related in 44 isolates collected from children enrolled in Eastern Iowa Head Start program. We found some key and quite interesting findings on S. mutans isolates from this Iowa Cohort. Primarily, we found a high degree of gene content variations in this set of S. mutans strains. Secondly, we have found that AP-PCR and CGH were equally good at identifying different clones of S. mutans within this set of isolates. Where more than one clone was identified within a single person by AP-PCR, it was confirmed by CGH that these strains were genetically distantly related.
A similar degree of genetic variation was observed in a previous comparative genomic study on nine selected S. mutans strains collected between 1966 and 2001 from three continents (41, 42). In this previous CGH study, the number of ORFs missing among 9 strains ranged from 113 to 227 (calculated using accompanied online supplemental data) compared to 87 to 237 in our 11 strains. While both our and published studies had a strain with the same number of deleted ORFs (113), two strains actually had very different sets of 113 absent ORFs (only 67 ORFs were in common) and therefore were different strains. The total number of variable ORFs, defined as absent in at least one of S. mutans strains, was 323 in our study. Similarly, the total variable ORFs found in nine global strains by Waterhouse et al. was 385 (42). Three hundred of these variable ORFs were common in our and the published studies. Variable ORFs identified in only one study were distributed among variable regions across genome. The largest block of variable ORFs unique to our study consisted of three ORFs, SMU.1315c to SMU.1317c, encoding a putative ATP-binding protein and two hypothetical proteins. The largest block of variable ORFs unique to the Waterhouse et al. study consisted of eight ORFs (SMU.132 to 143) encoding putative transcriptional regulators and metabolic enzymes. These eight ORFs, however, were consistently present in all our strains. Overall, gene content diversity among S. mutans from our local sample was similar to that of the global sample, suggesting S. mutans has a fluid and rapidly evolving genome structure that is likely to be similar to that found in other Streptococci. Previous studies on Streptococcus pneumoniae and Streptococcus agalactiae have shown a great gene content variability. An analysis by Hiller et al. on 17 S. pneumoniae genomes indicated that differences per strain pair ranged from 35 to 629 orthologous gene clusters, with each strain’s genome containing between 21 and 32% noncore genes (14). Similarly, Tettelin et al. analyzed the genomes of eight S. agalactiae strains and showed that ~20% of the genes are not shared among all strains (37).
Our CGH experiments were limited to the detection of genes that are present in the UA159 genome. Other S. mutans are likely to carry many additional novel ORFs and genomic islands. Assuming a genome size similar to UA159 and an average size of 1 kb for each S. mutans gene, we estimate that our test strain, which has 237 absent ORFs by hybridization, could have as many as 237 kb DNA new gene-coding sequences in its genome. 237 kb of DNA sequence represents almost 12% of the 2.03 Mb UA159 genome; this difference is likely bigger than expected based on natural variation in the genome size. As more S. mutans strains are sequenced we will be able to infer more about genome size variations among S. mutans. In other Streptococcus genomes, sizes ranged 2.13 to 2.2 for three completely annotated genomes of Streptococcus agalactiae, and 1.84 to 1.94 for 11 completely annotated genomes of Streptococcus pyogenes (NCBI). These variations were within 5% of the average genome sizes of these species and well below the 12% we calculated with 11 test S. mutans. S. mutans strains showing the greatest number of absent ORFs by CGH will be the prime candidates for additional genome sequencing efforts as the probability of discovering novel genes in these strain is high. Using a finite-supragenome model, Hiller et al. predicted that 33 representative S. pneumoniae genomes needed to be sequenced to identify 99% of the orthologous gene clusters that are represented in the S. pneumoniae population at frequencies of 0.1 (14). Therefore, a substantial number of S. mutans genomes need to be sequenced to identify most S. mutans genes.
Variable ORFs were distributed across 19 of the 21 COG functional categories. Variable ORFs were over represented in COG groups involved in defense mechanisms, carbohydrate transport and metabolism, secondary metabolites biosynthesis, and replication, recombination and repair. By contrast, COG categories involved in basic cellular machinery were not represented in variable ORFs to a significant extent. The survival of a pathogen in new environmental niches is increasingly associated with genes that are involved in essential metabolic and catabolic pathways (40). Many of these so called “life-style genes” have been identified in the variable ORFs of S. mutans and may be important in determining pathogenicity of S. mutans. These variable genes are often acquired by horizontal transmission of pathogenicity islands that also retained genes such as transposases, integrases, and potential ruminants of integrated plasmid. This may explain why a large number of variable genes are associated with replication, recombination and repair.
Comparison of genetic diversity of S. mutans strains obtained from CGH and AP-PCR shows that when performed properly, AP-PCR fingerprinting can reliably distinguish multiple S. mutans clones within individual subjects. In our case, it also distinguished clones from different individuals. However, it is difficult to assess the degree of relatedness between any two clones based on fingerprinting patterns in any informative way given that band patterns do not give information about gene content.. CGH provided a genome-wide view of the difference in gene content.
When multiple isolates from two distinct clones from two children with high caries were analyzed, CGH indicated minor differences in genetic content within each clone (as determined by differences in hybridization patterns) in a few ORFs. These differences may represent recent evolutionary changes resulting from the active recombination or deletion within the local S. mutans population. Further tests are needed to reproduce this finding. If proven to be common phenomena, it would be interesting to examine whether the rate of microevolution differs between caries active and caries free populations.
While variation among the 11 unique S. mutans strains tested was large, the difference between the 6 strains from children with high caries and 5 strains from children with low caries were not significant given the small sample size; however we made several interesting observations that warrant further investigation. The gene for glucan binding protein, gbpA was present in all 11 strains. Several genes associated with bacteriocin production, carbohydrate metabolism and many of unknown function were differentially deleted in some but not all low and high caries strains. Using a high throughput microarray platform, Library on a Slide (43), we are currently investigating the prevalence of these genes in a population-based sample of S. mutans collection from people with and without caries in order to evaluate whether these genes are associated with S. mutans cariogenicity.
R01-DE 014889, Carver Foundation; UA159 microarrays provided by a grant from the NAID-sponsored Pathogen Functional Genomics Resource Center (PFGRC) of the J. Craig Venter Institute.