|Home | About | Journals | Submit | Contact Us | Français|
Most of the archaea and numerous bacteria possess an elaborate system of adaptive immunity to mobile genetic elements known as the CRISPR (clustered regularly interspaced short palindromic repeats)-associated system (CRISPR-Cas), which consists of arrays of short repeats interspersed with unique DNA spacers and adjacent operons encompassing CRISPR-associated (cas) genes with predicted and, in some cases, experimentally validated nuclease, helicase, and polymerase activities. The system functions by integrating fragments of alien DNA between the repeats and employing their transcripts to degrade the DNA of the respective invading elements via an RNA interference-like mechanism. The CRISPR-Cas system is a case of apparent Lamarckian inheritance.
Bacteria and archaea exist in an incessant arms race with various selfish genetic elements (phages, transposons, and plasmids) and have evolved a variety of defense systems. The best known ones probably are the numerous restriction-modification enzyme systems that exploit different methylation patterns of host and infecting agent DNA to eliminate the invader . Recently, a novel widespread defense system that functions on a completely different principle was discovered; it became known as the CRISPR (clustered regularly interspaced short palindromic repeats)-associated system, usually referred to as CRISPR-Cas (where Cas stands for CRISPR-associated proteins) or, alternatively, as CASS [2-4]. The discovery of this system involved considerable intrigue and serendipity. Distinct arrays of short repeats interspersed with unique spacers (CRISPR) have been seen in bacterial and archaeal genomes for years, with no clues as to their possible functions [5,6]. Independently, Cas protein sequences encoded by putative operons adjacent to CRISPR  were analyzed in detail and found to contain domains characteristic of several nucleases, a helicase, a polymerase, and RNA-binding proteins; it has been suggested that these proteins might belong to a novel repair system . A new light was shed on the probable function of the CRISPR when it was observed that some of the unique inserts were (nearly) identical to fragments of phage and plasmids genes, a pivotal observation that immediately led to the idea that CRISPR might be involved in defense against selfish elements [9-11]. These findings were combined with the results of comprehensive computational re-analysis of the Cas proteins to develop a detailed hypothesis on the mechanism of CRISPR-Cas . This hypothesis drew a close analogy between the putative novel prokaryotic defense system and the eukaryotic RNA interference (RNAi) mechanisms , with the important difference that CASS mediates integration of a piece of alien DNA into the host genome as the first step in the sequence of events which leads to immunity to the given agent . Specific roles for individual Cas proteins were proposed as well on the basis of their domain composition and by analogy with RNAi components although the proteins involved are not homologous . This hypothesis prompted direct experiments that demonstrated that engineering a specific bacteriophage sequence into the CRISPR locus of the lactic bacterium Streptococcuss thermophilus indeed conferred resistance to the cognate phage, an effect that was abrogated by even a single mismatch between the insert and the target gene . These key experiments clinched the case for the defense role of CRISPR-Cas and triggered an avalanche of further genetic and biochemical experiments.
CRISPR-Cas is a highly diverse constellation of genes, with the number of CRISPR-cas loci, cas gene repertoire, and (predicted) operon organization often changing even between closely related strains [3,12,15-18]. Comparative analysis of operon architectures revealed seven distinct types of CRISPR-Cas, each of which is characterized by a distinct signature of genomic architecture . Only two genes, cas1 and cas2, are invariably present in each CRISPR-Cas system so far detected and accordingly can be used as genomic markers of CRISPR-Cas. In addition to the two universal genes, three genes (cas3, cas4, and cas5) are present in the majority of CRISPR-Cas, and approximately 20 other genes are found in various subsets of these systems [12,15]. Cas5 and several other less common CRISPR-Cas components belong to the large and extremely diverged superfamily of repeat-associated mysterious proteins (RAMPs) [12,15].
Using the highly conserved Cas1 protein sequence as a marker, we detected CRISPR-Cas in 297 of the 774 analyzed prokaryotic genomes (37%); among archaea, CRISPR-Cas is much more common than among bacteria: up to 90% of the available archaeal genomes carry CRISPR-Cas (Figure 1). The representation of CRISPR-Cas in the genomes of diverse groups of archaea and bacteria differs within a broad range, from ubiquity to complete absence (although it should be noted that all groups completely lacking CRISPR-Cas are currently represented by a small number of genomes) (Figure 1).
Phylogenetic analysis of core cas genes, such as cas1 and cas3, fails to recover major bacterial and archaeal lineages, an observation that appears to be indicative of extensive mobility of CRISPR-cas modules via horizontal gene transfer (HGT). The cas genes are not only horizontally mobile but also typically show high rates of sequence evolution, with the partial exception of core genes, in particular cas1 [12,15]. In many cases, this fast evolution renders sequence conservation between homologous Cas proteins barely detectable, most conspicuously among the RAMPs, which are propagated by both gene duplication and HGT and constitute a large fraction of Cas protein sets in most CRISPR-Cas-carrying prokaryotes . The RAMPs are extremely diverged in sequence, so that the demonstration that different RAMP families were related and possessed the same fold required the careful use of the most sensitive sequence analysis methods (and even so, it is likely that additional RAMPs have been missed). Conceivably, RAMPs and perhaps some other Cas proteins evolve under positive selection dictated by the arms race with selfish elements [19,20]. This possibility is congruent with the observations that, although virus or plasmid origin is apparent for a considerable number of CRISPR spacers, the majority of the spacers are not significantly similar to any sequences in current databases [9,19,20]. In addition, deletion of CRISPR units (a repeat with a spacer) has been demonstrated . Thus, it appears, first, that the repertoire of selfish genetic elements encountered by archaea and bacteria is vast, and second, that the CRISPR-Cas-mediated immunity is short-lived; that is, spacers rapidly deteriorate by mutation once the cognate element is no longer a threat .
Taken together, these observations identify the CRISPR-Cas as a bona fide component of the prokaryotic mobilome [that is, the totality of genetic elements that are characterized by extensive horizontal mobility and include selfish elements (viruses, plasmids, transposons, and so on) as well as defense and stress response systems] [22,23]. Notably, CRISPR-cas loci are often located within ‘defense islands’ (i.e., regions of bacterial and archaeal genomes that consist primarily of genes encoding defense and stress response systems, such as restriction-modification and toxin-antitoxin modules) . This genomic association permits the prediction of novel prokaryotic defense systems. Comparative-genomic analysis of the CRISPR-cas loci is facilitated by the use of specialized databases and accompanying custom software tools for CRISPR detection .
CRISPR-Cas systems mediate immunity to invading genetic elements via three distinct stages: (a) adaptation, (b) expression and processing of CRISPR, and (c) interference . The full molecular picture is far from being clear for each of these stages, but recently several fundamental results, particularly on the processing of CRISPR transcripts, have been reported. With regard to the adaptation stage, following the original work that demonstrated the insertion of a phage-specific spacer into the CRISPR locus of Streptococcus thermophilus, this process was explored systematically, leading to the conclusion that a phage challenge typically triggers insertion of a single phage-specific resistance-conferring spacer with a characteristic length of 30 base pairs; successive infection of a bacterial culture with multiple phages led to the accumulation of the cognate spacers in the CRISPR loci . Furthermore, it has been shown that insertion of new spacers depends on short PAMs (proto-spacer adjacent motifs), which differ between variants of the CRISPR-Cas system and might determine the identity of the inserted spacer .
The original ‘prokaryotic RNAi’ hypothesis maintained that CRISPR-Cas systems would target mRNAs of invading agents . However, the first experiments that, in general, validated the hypothesis have also shown that both strands of the CRISPR spacer DNA were effective in conferring immunity to the cognate phage, an observation best compatible with a DNA target . A more direct experiment showed that the insertion of a self-splicing intron into the target gene made the respective plasmid resistant to the CRISPR-mediated immunity, a clear indication that the invading DNA itself is targeted . Whether this conclusion is general and applies to all CRISPR-Cas remains to be determined, especially given the extreme diversity of the architectures of these systems.
As of September 2009, biochemical activities and/or crystal structures of several widespread Cas proteins have been determined (Table 1) . In agreement with the computational predictions and nuclease activities, either RNAse or DNAse or both were demonstrated for several Cas proteins. Notably, these novel nucleases include both universal Cas proteins. Specifically, Cas1 has been shown to be a metal-dependent DNAse with no sequence specificity and has been implicated in the integration of the alien DNA into the CRISPR cassettes . Cas2 has been characterized as a metal-dependent endoribonuclease whose role in the CRISPR-Cas mechanism remains unclear . A striking finding is that some of the RAMP proteins that contain a double ferredoxin-fold domain and were originally proposed to be non-enzymatic RNA-binding proteins (considering their extreme sequence divergence ), actually possess RNAse activity that is apparently involved in the processing of CRISPR transcripts [30,31]. In particular, a RAMP protein seems to be the active moiety of the CASCADE (CRISPR-associated complex for antiviral defense) complex that consists of five Cas proteins (Table 1) and is the CRISPR-processing machine of Escherichia coli . In concert with the Cas3 protein that consists of (predicted) helicase and nuclease domains, CASCADE seems to be involved in the interference stage.
Comparative-genomic predictions validated by a rapidly growing body of experimental results indicate that the CRISPR-Cas is an adaptive immunity system that is widely employed by archaea and bacteria for defense against diverse invading elements, in particular, viruses. The system functions by integrating fragments of alien element genes into CRISPR loci and employing the resulting spacers, after transcription and processing, as guide RNAs to abrogate the replication of the cognate elements by cleaving nucleic acid molecules complementary to the guide. In some cases, at least, the target of CRISPR-Cas is the genomic DNA of an invading genetic element. Experiments aimed at molecular dissection of CASS proved the predicted principle of its action and are starting to reveal multiple activities of the protein components of CASS and the molecular architecture of complexes formed by these proteins. However, an enormous amount of experimental work remains to be done to elucidate the mechanisms of CASS, in particular, the molecular details of spacer incorporation into the CRISPR loci and the specific pathways of RNA-guided destruction of alien genomes. These experiments can be expected to reveal the considerable mechanistic diversity that reflects the extreme diversity of cas gene repertoires and operonic organization. Another important direction of future work is the characterization of the arms race between CRISPR-Cas and viruses of prokaryotes and elucidation of putative mechanisms of counterdefense employed by the viruses. Finally, it is worth noting that, by integrating fragments of invaders' genomes into the genomes of the archaeal and bacterial hosts, the CASS effectively operates via a Lamarckian-type inheritance of acquired characters.
The authors are supported by the Department of Health and Human Services (National Institutes of Health, National Library of Medicine) intramural funds.
The electronic version of this article is the complete one and can be found at: http://F1000.com/Reports/Biology/content/1/95
The authors declare that they have no competing interests.