In addition to the two CRISPR loci previously described in S. thermophilus
), we report here the identification of CRISPR3 in the LMD-9 genome (20
). Interestingly, this particular CRISPR locus is not ubiquitous in S. thermophilus
genomes. Between the three CRISPR loci present in S. thermophilus
genomes, diversity is observed at many levels, including (i) the typical CRISPR repeat sequence; (ii) the cas
gene content, organization, and sequence; (iii) locus architecture and content; and (iv) spacer content, arrangement, and sequence. Diversity was observed across the three CRISPR loci between 124 different S. thermophilus
strains. Specifically, CRISPR1 was ubiquitous, whereas CRISPR2 was present in 59 of 65 strains, and CRISPR3 was present in 53 of 66 strains. A total of 49 strains (39.5%) carried all three loci.
Comparative genome analysis of CRISPR content in streptococci and various bacterial genera and species indicates that the three S. thermophilus CRISPR loci are distributed differently. Notably, CRISPR1 is present in only a few streptococci, whereas CRISPR3 can be found in most Streptococcus species. The distribution of these three CRISPR loci suggests that CRISPR1 may have recently become more specific to a few streptococcal species, whereas CRISPR3 is more widespread across streptococci, and CRISPR2 may be a vestige of a gram-positive ancestor. This is consistent with the absence of CRISPR2 and/or CRISPR3 in various S. thermophilus strains. In fact, detailed sequence analysis of distinct CRISPR3 locus architectures in various S. thermophilus strains suggests that deletions may have occurred via homologous recombination events involving CRISPR3 repeats, likely including the degenerate repeat in the vicinity of serB (Fig. ).
When equivalent CRISPR loci between strains are compared, a high degree of polymorphism is observed for spacer content and sequences. Specifically, 105 of 124, 7 of 59, and 20 of 53 unique spacer arrangements were observed for CRISPR1, CRISPR2, and CRISPR3, respectively. This indicates that the overall CRISPR content was unique in most strains. Perhaps the polymorphisms observed in the spacer contents of the three CRISPR loci across different S. thermophilus strains are an indicator of the activity of the locus, whereby spacer hypervariability is directly correlated with historical phage exposure. Arguably, the degree of spacer polymorphism, in terms of both total number of unique spacers and total number of unique spacer arrangements, for a given CRISPR locus, could be directly correlated with its activity. Consequently, we propose that in S. thermophilus CRISPR1 is the most active locus, followed by CRISPR3. This is supported by several observations: (i) repeat degeneracy seems to correlate with relative activity, whereby the most degenerate repeats are found in the least active locus, namely, CRISPR2; (ii) spacer size is more highly conserved in the most active loci, namely, CRISPR1 and CRISPR3, and least conserved in the least active locus, namely, CRISPR2; (iii) the average and maximum numbers of spacers are highest for CRISPR1 and lowest for CRISPR2; and (iv) the number of CRISPR BIMs obtained is higher for CRISPR1 than CRISPR3.
Previous data have suggested that the enzymatic machinery of a specific locus cannot be effective in conjunction with the CRISPR genetic content of another (2
). Specifically, when cas
genes are inactivated in a particular CRISPR locus, the ability of this locus to provide resistance and integrate novel spacers is lost, despite the concurrent presence of other CRISPR loci and cas
genes elsewhere in the chromosome (2
). Here, we provide data indicating that each CAS system may be directly linked to a particular CRISPR repeat sequence, which is consistent with the observed comparable clustering of CRISPR repeats and Cas sequences (Fig. ), as previously suggested by Kunin et al. (16
). Further studies investigating the mechanism of action of CRISPRs are currently under way and might provide insights into the roles of the various cas
genes and the functional link between specific Cas proteins and a particular CRISPR repeat. Among Cas proteins, some are likely involved in the addition of novel repeat-spacer units, via a molecular interaction with CRISPR repeats. Other Cas proteins are likely involved in the spacer-encoded resistance, which may be mediated via a RNAi-like mechanism (19
). These Cas proteins probably include at least one nuclease which might recognize and digest a specific target sequence. This is supported by the recent discovery of a highly conserved motif, which we propose to name CRISPR motif, immediately downstream of the proto-spacers found in phage sequences (7
). For CRISPR1, the AGAAW CRISPR motif located two nucleotides downstream of the proto-spacer might serve as a recognition site for a CRISPR1-specific Cas nuclease (Fig. ). A different CRISPR motif was also identified for CRISPR3 (Fig. ), GGNG, located one nucleotide downstream of the proto-spacer, which suggests again that each CRISPR locus has a unique CRISPR motif which may serve as a sequence recognition pattern, specific to a particular Cas enzymatic machinery. Further, CRISPR motifs may serve as additional elements to define a particular CRISPR/Cas system.
FIG. 7. CRISPR motifs identified in the vicinity of the CRISPR proto-spacers. (A) Motif identified in the vicinity of CRISPR1 proto-spacers in the genome of the phage used in the challenge; (B) motif identified in the vicinity of CRISPR3 proto-spacers in the (more ...)
We have shown that two distinct CRISPR loci, namely, CRISPR1 and CRISPR3 have the ability to evolve directly in response to phages by the polarized addition of new spacers derived from viral genomic sequences. Accordingly, CRISPR spacers provide a historical perspective of phage exposure, whereby spacers present in the vicinity of the leader were relatively recently added, whereas distal spacers likely originated from previous events.
In addition to CRISPR variability due to the acquisition of novel spacers in response to phages, primarily at the leader end, we noticed that modifications can occur throughout the CRISPR locus, as seen in DGCC7710Φ2972+S15
), where a deletion occurred concomitantly with the insertion of a new spacer at the leader end (Fig. ). Specifically, most of the variability observed at the trailer end of the locus seems to occur via deletion (Fig. ), arguably resulting in the preferential deletion of older spacers, which are likely less valuable for the bacterium in its current environment. This phenomenon is probably due to homologous recombination events occurring between CRISPR direct repeats. On the other hand, spacers recently acquired may be more valuable and thus more likely to be retained in the current environment. In some instances, peculiar spacers seem to be retained between seemingly distant strains, perhaps indicating that they provide a critical function (Fig. ), such as targeting a conserved phage sequence. Altogether, CRISPR loci seem to evolve both through additions and deletions of repeat-spacer units.
Similarities between CRISPR spacers and phage or plasmid sequences have been documented previously (2
). Although the majority of CRISPR spacers shows homology to phage (77%) and plasmid (16%) sequences, we identified four CRISPR spacers that are 100% identical to S. thermophilus
chromosomal gene sequences, including dtpT
. This might indicate that the CRISPR/Cas system, in addition to providing resistance against foreign genetic elements such as plasmids and phages, may also serve as a microbial regulatory system involved in the control of mRNA transcripts levels for genes encoded on the chromosome, perhaps using a system based on RNAi, as previously suggested (19
Overall, the dynamic nature of CRISPR loci is potentially valuable for typing and comparative analyses of strains and microbial populations. Given that some loci are relatively active while others bear lower levels of polymorphism, the potential of a given CRISPR locus for typing and epidemiological studies has to be assessed on a case-by-case basis. Since CRISPRs are widely distributed in Bacteria and Archaea and actively involved in an adaptive immune system against foreign genetic elements, as well as intrinsic chromosomal elements, they provide critical insights into the relationships between prokaryotes and their environments, notably the coevolution of host and viral genomes.