The clustered regularly interspaced short palindromic repeats (CRISPR) and their associated proteins comprise a significant prokaryotic defense system against viruses and horizontally transferred nucleic acids (1–4
). This defense system consists of a CRISPR array that is usually preceded by a leader sequence and located near a cluster of CRISPR-associated (cas
) genes (5–7
). RNA transcribed from the CRISPR array is processed by Cas proteins and directs interfering proteins to target nucleic acids matching the sequences between the repeats. These sequences, called spacers, often originate from plasmids and phages, and thus the system adaptively targets these invaders.
The adaptation process of the CRISPR system, i.e. acquisition of new spacers into the genome, is still poorly understood. Barrangou et al
. were the first to report spacer acquisition into the CRISPR array of Streptococcus thermophilus
). They showed that bacteria surviving a phage challenge expanded their CRISPR array with spacers identical to small DNA regions from the challenging phage, termed protospacers. Spacer acquisition seemed polarized toward the leader end of the array. Their study did not identify a bias of sampled protospacers from a specific strand nor a preference for a specific region in the phage DNA. Knock out of csn2
[previously annotated cas7
)] dramatically reduced spacer acquisition, providing an indirect evidence that the product of csn2
is essential for adaptation of the CRISPR array in S. thermophilus
. Later, van der Ploeg characterized in vivo
spacer acquisition in Streptococcus mutans
. He observed acquisition of new spacers in ~25% of phage-resistant mutants. The acquired spacers, in this case too, corresponded to randomly distributed protospacers with regard to strand or position in the phage genome (9
). These studies did not address the roles of the repeats, leader and the core Cas proteins in the acquisition process.
The core Cas proteins, Cas1 and Cas2, were hypothesized to play a major role in the acquisition process. This hypothesis is based on the fact that these two proteins have no role in the interference stage, yet they are conserved in most CRISPR loci (2
). Involvement of Cas1 and Cas2 in the acquisition step is supported by the fact that both proteins show endonuclease activities. Cas1 of Pseudomonas aeruginosa
and of Escherichia coli
was shown to function as a metal-dependent DNA endonuclease (11
), and Cas2 of Sulfolobus solfataricus
and other strains was shown to be an ssRNA-specific endonuclease (13
). Nevertheless, direct evidence for Cas1 and Cas2 involvement in the adaptation process has not yet been provided.
It was suggested from DNA sequence analyses, and later shown experimentally, that short, 2–5
bp sequences found near the protospacer, called protospacer adjacent motifs (PAMs), are crucial for the interference step (14
). Requirement of PAMs for the interference stage suggests that acquisition of new spacers requires DNA sequences having PAMs. Indeed, it was demonstrated that spacers conferring phage resistance were identical in sequence to protospacers with PAMs (2
). Moreover, some phage mutants escaping CRISPR/Cas-interference harbored mutations in the PAMs, indicating that PAMs play a role in both interference and adaptation steps (14
Insights on the adaptation process were also obtained from in silico
analyses. These analyses determined PAM sequences for six different CRISPR types based on sequence conservations adjacent to protospacers (15
) [most recent classification system (8
) in parentheses]: 1, 2 (I-E), 3 (I-C), 4 (I-F), 7 and 10 (II). For example, the study identified that CRISPR-2 type (I-E), to which E. coli
arrays I and II belong, contained a PAM of the sequence 5′-AWG. Sequence analyses of CRISPR arrays also indicated that the leader sequence may orient the acquisition of new spacers, yet no direct experimental evidence for these analyses was provided.
Partially due to lack of a robust experimental system to study adaptation, several primary questions have not yet been experimentally addressed: How are spacers incorporated into the genome? Which proteins are essential for this process? Are the leader or repeat sequences important for this process? What elements in the leader sequence are required? We describe a robust assay to study the adaptation process in E. coli and provide insights on the essential proteins, DNA elements and insertion mechanism of repeat-spacer units.