Well-studied innate immune systems exist throughout bacteria and archaea, but a more recently discovered genomic locus may offer prokaryotes surprising immunological adaptability. Mediated by a cassette-like genomic locus termed Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), the microbial adaptive immune system differs from its eukaryotic immune analogues by incorporating new immunities unidirectionally. CRISPR thus stores genomically recoverable timelines of virus-host coevolution in natural organisms refractory to laboratory cultivation. Here we combined a population genetic mathematical model of CRISPR-virus coevolution with six years of metagenomic sequencing to link the recoverable genomic dynamics of CRISPR loci to the unknown population dynamics of virus and host in natural communities. Metagenomic reconstructions in an acid-mine drainage system document CRISPR loci conserving ancestral immune elements to the base-pair across thousands of microbial generations. This ‘trailer-end conservation’ occurs despite rapid viral mutation and despite rapid prokaryotic genomic deletion. The trailer-ends of many reconstructed CRISPR loci are also largely identical across a population. ‘Trailer-end clonality’ occurs despite predictions of host immunological diversity due to negative frequency dependent selection (kill the winner dynamics). Statistical clustering and model simulations explain this lack of diversity by capturing rapid selective sweeps by highly immune CRISPR lineages. Potentially explaining ‘trailer-end conservation,’ we record the first example of a viral bloom overwhelming a CRISPR system. The polyclonal viruses bloom even though they share sequences previously targeted by host CRISPR loci. Simulations show how increasing random genomic deletions in CRISPR loci purges immunological controls on long-lived viral sequences, allowing polyclonal viruses to bloom and depressing host fitness. Our results thus link documented patterns of genomic conservation in CRISPR loci to an evolutionary advantage against persistent viruses. By maintaining old immunities, selection may be tuning CRISPR-mediated immunity against viruses reemerging from lysogeny or migration.
Most microbes appear unculturable in the laboratory, limiting our knowledge of how virus and prokaryotic host evolve in natural systems. However, a genomic locus found in many prokaryotes, CRISPR, may offer cultivation-independent probes of virus-microbe coevolution. Utilizing nearby genes, CRISPR can serially incorporate short viral and plasmid sequences. These sequences bind and cleave cognate regions in subsequent viral and plasmid insertions, conferring adaptive anti-viral and anti-plasmid immunity. By incorporating sequences undirectionally, CRISPR also provides timelines of virus-prokaryote coevolution. Yet, CRISPR only incorporates 30–80 base-pair viral sequences, leaving incomplete coevolutionary recordings. To reconstruct the missing coevolutionary dynamics shaping natural CRISPRs, we combined metagenomic reconstructions with population-scale mathematical modeling. Capturing rare and rapid sweeps of CRISPR diversity by highly immune lines, mathematical modeling explains why naturally reconstructed CRISPR loci are often largely identical across a population. Both model and experiment further document surprising proliferations of old viral sequences against which hosts had preexisting CRISPR immunity. Due to these deadly blooms of ancestral viral elements, CRISPR's conservation of old immune sequences appears to confer a selective advantage. This may explain the striking immunological memory documented in CRISPR loci, which occurs despite rapid viral mutation and despite rapid deletions in prokaryotic genomes.
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) loci, together with cas (CRISPR–associated) genes, form the CRISPR/Cas adaptive immune system, a primary defense strategy that eubacteria and archaea mobilize against foreign nucleic acids, including phages and conjugative plasmids. Short spacer sequences separated by the repeats are derived from foreign DNA and direct interference to future infections. The availability of hundreds of shotgun metagenomic datasets from the Human Microbiome Project (HMP) enables us to explore the distribution and diversity of known CRISPRs in human-associated microbial communities and to discover new CRISPRs. We propose a targeted assembly strategy to reconstruct CRISPR arrays, which whole-metagenome assemblies fail to identify. For each known CRISPR type (identified from reference genomes), we use its direct repeat consensus sequence to recruit reads from each HMP dataset and then assemble the recruited reads into CRISPR loci; the unique spacer sequences can then be extracted for analysis. We also identified novel CRISPRs or new CRISPR variants in contigs from whole-metagenome assemblies and used targeted assembly to more comprehensively identify these CRISPRs across samples. We observed that the distributions of CRISPRs (including 64 known and 86 novel ones) are largely body-site specific. We provide detailed analysis of several CRISPR loci, including novel CRISPRs. For example, known streptococcal CRISPRs were identified in most oral microbiomes, totaling ∼8,000 unique spacers: samples resampled from the same individual and oral site shared the most spacers; different oral sites from the same individual shared significantly fewer, while different individuals had almost no common spacers, indicating the impact of subtle niche differences on the evolution of CRISPR defenses. We further demonstrate potential applications of CRISPRs to the tracing of rare species and the virus exposure of individuals. This work indicates the importance of effective identification and characterization of CRISPR loci to the study of the dynamic ecology of microbiomes.
Human bodies are complex ecological systems in which various microbial organisms and viruses interact with each other and with the human host. The Human Microbiome Project (HMP) has resulted in >700 datasets of shotgun metagenomic sequences, from which we can learn about the compositions and functions of human-associated microbial communities. CRISPR/Cas systems are a widespread class of adaptive immune systems in bacteria and archaea, providing acquired immunity against foreign nucleic acids: CRISPR/Cas defense pathways involve integration of viral- or plasmid-derived DNA segments into CRISPR arrays (forming spacers between repeated structural sequences), and expression of short crRNAs from these single repeat-spacer units, to generate interference to future invading foreign genomes. Powered by an effective computational approach (the targeted assembly approach for CRISPR), our analysis of CRISPR arrays in the HMP datasets provides the very first global view of bacterial immunity systems in human-associated microbial communities. The great diversity of CRISPR spacers we observed among different body sites, in different individuals, and in single individuals over time, indicates the impact of subtle niche differences on the evolution of CRISPR defenses and indicates the key role of bacteriophage (and plasmids) in shaping human microbial communities.
The human bacterial pathogen Listeria monocytogenes is emerging as a model organism to study RNA-mediated regulation in pathogenic bacteria. A class of non-coding RNAs called CRISPRs (clustered regularly interspaced short palindromic repeats) has been described to confer bacterial resistance against invading bacteriophages and conjugative plasmids. CRISPR function relies on the activity of CRISPR associated (cas) genes that encode a large family of proteins with nuclease or helicase activities and DNA and RNA binding domains. Here, we characterized a CRISPR element (RliB) that is expressed and processed in the L. monocytogenes strain EGD-e, which is completely devoid of cas genes. Structural probing revealed that RliB has an unexpected secondary structure comprising basepair interactions between the repeats and the adjacent spacers in place of canonical hairpins formed by the palindromic repeats. Moreover, in contrast to other CRISPR-Cas systems identified in Listeria, RliB-CRISPR is ubiquitously present among Listeria genomes at the same genomic locus and is never associated with the cas genes. We showed that RliB-CRISPR is a substrate for the endogenously encoded polynucleotide phosphorylase (PNPase) enzyme. The spacers of the different Listeria RliB-CRISPRs share many sequences with temperate and virulent phages. Furthermore, we show that a cas-less RliB-CRISPR lowers the acquisition frequency of a plasmid carrying the matching protospacer, provided that trans encoded cas genes of a second CRISPR-Cas system are present in the genome. Importantly, we show that PNPase is required for RliB-CRISPR mediated DNA interference. Altogether, our data reveal a yet undescribed CRISPR system whose both processing and activity depend on PNPase, highlighting a new and unexpected function for PNPase in “CRISPRology”.
CRISPR-Cas systems confer to bacteria and archaea an adaptive immunity that protects them against invading bacteriophages and plasmids. In this study, we characterize a CRISPR (RliB-CRISPR) that is present in all L. monocytogenes strains at the same genomic locus but is never associated with a cas operon. It is an unusual CRISPR that, as we demonstrate, has a secondary structure consisting of basepair interactions between the repeat sequence and the adjacent spacer. We show that the RliB-CRISPR is processed by the endogenously encoded polynucleotide phosphorylase enzyme (PNPase). In addition, we show that the RliB-CRISPR system requires PNPase and presence of trans encoded cas genes of a second CRISPR-Cas system, to mediate DNA interference directed against a plasmid carrying a matching protospacer. Altogether, our data reveal a novel type of CRISPR system in bacteria that requires endogenously encoded PNPase enzyme for its processing and interference activity.
Clustered regularly interspaced short palindromic repeats (CRISPR) are hypervariable loci widely distributed in prokaryotes that provide acquired immunity against foreign genetic elements. Here, we characterize a novel Streptococcus thermophilus locus, CRISPR3, and experimentally demonstrate its ability to integrate novel spacers in response to bacteriophage. Also, we analyze CRISPR diversity and activity across three distinct CRISPR loci in several S. thermophilus strains. We show that both CRISPR repeats and cas genes are locus specific and functionally coupled. A total of 124 strains were studied, and 109 unique spacer arrangements were observed across the three CRISPR loci. Overall, 3,626 spacers were analyzed, including 2,829 for CRISPR1 (782 unique), 173 for CRISPR2 (16 unique), and 624 for CRISPR3 (154 unique). Sequence analysis of the spacers revealed homology and identity to phage sequences (77%), plasmid sequences (16%), and S. thermophilus chromosomal sequences (7%). Polymorphisms were observed for the CRISPR repeats, CRISPR spacers, cas genes, CRISPR motif, locus architecture, and specific sequence content. Interestingly, CRISPR loci evolved both via polarized addition of novel spacers after exposure to foreign genetic elements and via internal deletion of spacers. We hypothesize that the level of diversity is correlated with relative CRISPR activity and propose that the activity is highest for CRISPR1, followed by CRISPR3, while CRISPR2 may be degenerate. Globally, the dynamic nature of CRISPR loci might prove valuable for typing and comparative analyses of strains and microbial populations. Also, CRISPRs provide critical insights into the relationships between prokaryotes and their environments, notably the coevolution of host and viral genomes.
In order to get further insights into the role of the clustered, regularly interspaced, short palindromic repeats (CRISPRs) in Escherichia coli, we analyzed the CRISPR diversity in a collection of 290 strains, in the phylogenetic framework of the strains represented by multilocus sequence typing (MLST). The set included 263 natural E. coli isolates exposed to various environments and isolated over a 20-year period from humans and animals, as well as 27 fully sequenced strains. Our analyses confirm that there are two largely independent pairs of CRISPR loci (CRISPR1 and -2 and CRISPR3 and -4), each associated with a different type of cas genes (Ecoli and Ypest, respectively), but that each pair of CRISPRs has similar dynamics. Strikingly, the major phylogenetic group B2 is almost devoid of CRISPRs. The majority of genomes analyzed lack Ypest cas genes and contain CRISPR3 with spacers matching Ypest cas genes. The analysis of relatedness between strains in terms of spacer repertoire and the MLST tree shows a pattern where closely related strains (MLST phylogenetic distance of <0.005 corresponding to at least hundreds of thousands of years) often exhibit identical CRISPRs while more distantly related strains (MLST distance of >0.01) exhibit completely different CRISPRs. This suggests rare but radical turnover of spacers in CRISPRs rather than CRISPR gradual change. We found no link between the presence, size, or content of CRISPRs and the lifestyle of the strains. Our data suggest that, within the E. coli species, CRISPRs do not have the expected characteristics of a classical immune system.
Clustered regularly interspaced short palindromic repeats (CRISPRs) are a family of DNA direct repeats found in many prokaryotic genomes. Repeats of 21–37 bp typically show weak dyad symmetry and are separated by regularly sized, nonrepetitive spacer sequences. Four CRISPR-associated (Cas) protein families, designated Cas1 to Cas4, are strictly associated with CRISPR elements and always occur near a repeat cluster. Some spacers originate from mobile genetic elements and are thought to confer “immunity” against the elements that harbor these sequences. In the present study, we have systematically investigated uncharacterized proteins encoded in the vicinity of these CRISPRs and found many additional protein families that are strictly associated with CRISPR loci across multiple prokaryotic species. Multiple sequence alignments and hidden Markov models have been built for 45 Cas protein families. These models identify family members with high sensitivity and selectivity and classify key regulators of development, DevR and DevS, in Myxococcus xanthus as Cas proteins. These identifications show that CRISPR/cas gene regions can be quite large, with up to 20 different, tandem-arranged cas genes next to a repeat cluster or filling the region between two repeat clusters. Distinctive subsets of the collection of Cas proteins recur in phylogenetically distant species and correlate with characteristic repeat periodicity. The analyses presented here support initial proposals of mobility of these units, along with the likelihood that loci of different subtypes interact with one another as well as with host cell defensive, replicative, and regulatory systems. It is evident from this analysis that CRISPR/cas loci are larger, more complex, and more heterogeneous than previously appreciated.
The family of clustered regularly interspaced short palindromic repeats (CRISPRs) describes a class of DNA repeats found in nearly half of all bacterial and archaeal genomes. These DNA repeat regions have a remarkably regular structure: unique sequences of constant size, called spacers, sit between each pair of repeats. The DNA repeats do not encode proteins, but appear to be transcribed and processed into small RNAs that may have any number of functions, including resistance to any phage (i.e., virus of bacteria) whose sequence matches a spacer; spacers change rapidly as microbial strains evolve. This work describes 41 new CRISPR-associated (cas) gene families, which are always found near these repeats, in addition to the four previously known. It shows that CRISPR systems belong to different classes, with different repeat patterns, sets of genes, and species ranges. Most of these seem to come and go rather rapidly from their host genomes. These possibly beneficial mobile genetic elements may play an important role in driving prokaryotic evolution.
Clustered, regularly interspaced short palindromic repeats (CRISPR) provide bacteria and archaea with sequence-specific, acquired defense against plasmids and phage. Because mobile elements constitute up to 25% of the genome of multidrug-resistant (MDR) enterococci, it was of interest to examine the codistribution of CRISPR and acquired antibiotic resistance in enterococcal lineages. A database was built from 16 Enterococcus faecalis draft genome sequences to identify commonalities and polymorphisms in the location and content of CRISPR loci. With this data set, we were able to detect identities between CRISPR spacers and sequences from mobile elements, including pheromone-responsive plasmids and phage, suggesting that CRISPR regulates the flux of these elements through the E. faecalis species. Based on conserved locations of CRISPR and CRISPR-cas loci and the discovery of a new CRISPR locus with associated functional genes, CRISPR3-cas, we screened additional E. faecalis strains for CRISPR content, including isolates predating the use of antibiotics. We found a highly significant inverse correlation between the presence of a CRISPR-cas locus and acquired antibiotic resistance in E. faecalis, and examination of an additional eight E. faecium genomes yielded similar results for that species. A mechanism for CRISPR-cas loss in E. faecalis was identified. The inverse relationship between CRISPR-cas and antibiotic resistance suggests that antibiotic use inadvertently selects for enterococcal strains with compromised genome defense.
For many bacteria, including the opportunistically pathogenic enterococci, antibiotic resistance is mediated by acquisition of new DNA and is frequently encoded on mobile DNA elements such as plasmids and transposons. Certain enterococcal lineages have recently emerged that are characterized by abundant mobile DNA, including numerous viruses (phage), and plasmids and transposons encoding multiple antibiotic resistances. These lineages cause hospital infection outbreaks around the world. The striking influx of mobile DNA into these lineages is in contrast to what would be expected if a self (genome)-defense system was present. Clustered, regularly interspaced short palindromic repeat (CRISPR) defense is a recently discovered mechanism of prokaryotic self-defense that provides a type of acquired immunity. Here, we find that antibiotic resistance and possession of complete CRISPR loci are inversely related and that members of recently emerged high-risk enterococcal lineages lack complete CRISPR loci. Our results suggest that antibiotic therapy inadvertently selects for enterococci with compromised genome defense.
Bacteria and archaea face continual onslaughts of rapidly diversifying viruses and plasmids. Many prokaryotes maintain adaptive immune systems known as clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated genes (Cas). CRISPR-Cas systems are genomic sensors that serially acquire viral and plasmid DNA fragments (spacers) that are utilized to target and cleave matching viral and plasmid DNA in subsequent genomic invasions, offering critical immunological memory. Only 50% of sequenced bacteria possess CRISPR-Cas immunity, in contrast to over 90% of sequenced archaea. To probe why half of bacteria lack CRISPR-Cas immunity, we combined comparative genomics and mathematical modeling. Analysis of hundreds of diverse prokaryotic genomes shows that CRISPR-Cas systems are substantially more prevalent in thermophiles than in mesophiles. With sequenced bacteria disproportionately mesophilic and sequenced archaea mostly thermophilic, the presence of CRISPR-Cas appears to depend more on environmental temperature than on bacterial-archaeal taxonomy. Mutation rates are typically severalfold higher in mesophilic prokaryotes than in thermophilic prokaryotes. To quantitatively test whether accelerated viral mutation leads microbes to lose CRISPR-Cas systems, we developed a stochastic model of virus-CRISPR coevolution. The model competes CRISPR-Cas-positive (CRISPR-Cas+) prokaryotes against CRISPR-Cas-negative (CRISPR-Cas−) prokaryotes, continually weighing the antiviral benefits conferred by CRISPR-Cas immunity against its fitness costs. Tracking this cost-benefit analysis across parameter space reveals viral mutation rate thresholds beyond which CRISPR-Cas cannot provide sufficient immunity and is purged from host populations. These results offer a simple, testable viral diversity hypothesis to explain why mesophilic bacteria disproportionately lack CRISPR-Cas immunity. More generally, fundamental limits on the adaptability of biological sensors (Lamarckian evolution) are predicted.
A remarkable recent discovery in microbiology is that bacteria and archaea possess systems conferring immunological memory and adaptive immunity. Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated genes (CRISPR-Cas) are genomic sensors that allow prokaryotes to acquire DNA fragments from invading viruses and plasmids. Providing immunological memory, these stored fragments destroy matching DNA in future viral and plasmid invasions. CRISPR-Cas systems also provide adaptive immunity, keeping up with mutating viruses and plasmids by continually acquiring new DNA fragments. Surprisingly, less than 50% of mesophilic bacteria, in contrast to almost 90% of thermophilic bacteria and Archaea, maintain CRISPR-Cas immunity. Using mathematical modeling, we probe this dichotomy, showing how increased viral mutation rates can explain the reduced prevalence of CRISPR-Cas systems in mesophiles. Rapidly mutating viruses outrun CRISPR-Cas immune systems, likely decreasing their prevalence in bacterial populations. Thus, viral adaptability may select against, rather than for, immune adaptability in prokaryotes.
All immune systems must distinguish self from non-self to repel invaders without inducing autoimmunity. Clustered, regularly interspaced, short palindromic repeat (CRISPR) loci protect bacteria and archaea from invasion by phage and plasmid DNA through a genetic interference pathway1–9. CRISPR loci are present in ~ 40% and ~90% of sequenced bacterial and archaeal genomes respectively10 and evolve rapidly, acquiring new spacer sequences to adapt to highly dynamic viral populations1, 11–13. Immunity requires a sequence match between the invasive DNA and the spacers that lie between CRISPR repeats1–9. Each cluster is genetically linked to a subset of the cas (CRISPR-associated) genes14–16 that collectively encode >40 families of proteins involved in adaptation and interference. CRISPR loci encode small CRISPR RNAs (crRNAs) that contain a full spacer flanked by partial repeat sequences2, 17–19. CrRNA spacers are thought to identify targets by direct Watson-Crick pairing with invasive “protospacer” DNA2, 3, but how they avoid targeting the spacer DNA within the encoding CRISPR locus itself is unknown. Here we have defined the mechanism of CRISPR self/non-self discrimination. In Staphylococcus epidermidis, target/crRNA mismatches at specific positions outside of the spacer sequence license foreign DNA for interference, whereas extended pairing between crRNA and CRISPR DNA repeats prevents autoimmunity. Hence, this CRISPR system uses the base-pairing potential of crRNAs not only to specify a target but also to spare the bacterial chromosome from interference. Differential complementarity outside of the spacer sequence is a built-in feature of all CRISPR systems, suggesting that this mechanism is a broadly applicable solution to the self/non-self dilemma that confronts all immune pathways.
CRISPR/Cas, bacterial and archaeal systems of interference with foreign genetic elements such as viruses or plasmids, consist of DNA loci called CRISPR cassettes (a set of variable spacers regularly separated by palindromic repeats) and associated cas genes. When a CRISPR spacer sequence exactly matches a sequence in a viral genome, the cell can become resistant to the virus. The CRISPR/Cas systems function through small RNAs originating from longer CRISPR cassette transcripts. While laboratory strains of Escherichia coli contain a functional CRISPR/Cas system (as judged by appearance of phage resistance at conditions of artificial co-overexpression of Cas genes and a CRISPR cassette engineered to target a λ phage), no natural phage resistance due to CRISPR system function was observed in this best-studied organism and no E. coli CRISPR spacer matches sequences of well-studied E. coli phages. To better understand the apparently “silent” E. coli CRISPR/Cas system, we systematically characterized processed transcripts from CRISPR cassettes. Using an engineered strain with genomically located spacer matching phage λ we show that endogenous levels of CRISPR cassette and cas genes expression allow only weak protection against infection with the phage. However, derepression of the CRISPR/Cas system by disruption of the hns gene leads to high level of protection.
In prokaryotes, clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated (Cas) proteins constitute a defence system against bacteriophages and plasmids. CRISPR/Cas systems acquire short spacer sequences from foreign genetic elements and incorporate these into their CRISPR arrays, generating a memory of past invaders. Defence is provided by short non-coding RNAs that guide Cas proteins to cleave complementary nucleic acids. While most spacers are acquired from phages and plasmids, there are examples of spacers that match genes elsewhere in the host bacterial chromosome. In Pectobacterium atrosepticum the type I-F CRISPR/Cas system has acquired a self-complementary spacer that perfectly matches a protospacer target in a horizontally acquired island (HAI2) involved in plant pathogenicity. Given the paucity of experimental data about CRISPR/Cas–mediated chromosomal targeting, we examined this process by developing a tightly controlled system. Chromosomal targeting was highly toxic via targeting of DNA and resulted in growth inhibition and cellular filamentation. The toxic phenotype was avoided by mutations in the cas operon, the CRISPR repeats, the protospacer target, and protospacer-adjacent motif (PAM) beside the target. Indeed, the natural self-targeting spacer was non-toxic due to a single nucleotide mutation adjacent to the target in the PAM sequence. Furthermore, we show that chromosomal targeting can result in large-scale genomic alterations, including the remodelling or deletion of entire pre-existing pathogenicity islands. These features can be engineered for the targeted deletion of large regions of bacterial chromosomes. In conclusion, in DNA–targeting CRISPR/Cas systems, chromosomal interference is deleterious by causing DNA damage and providing a strong selective pressure for genome alterations, which may have consequences for bacterial evolution and pathogenicity.
Bacteria have evolved mechanisms that provide protection from continual invasion by viruses and other foreign elements. Resistance systems, known as CRISPR/Cas, were recently discovered and equip bacteria and archaea with an “adaptive immune system.” This adaptive immunity provides a highly evolvable sequence-specific small RNA–based memory of past invasions by viruses and foreign genetic elements. There are many cases where these systems appear to target regions within the bacterial host's own genome (a possible autoimmunity), but the evolutionary rationale for this is unclear. Here, we demonstrate that CRISPR/Cas targeting of the host chromosome is highly toxic but that cells survive through mutations that alleviate the immune mechanism. We have used this phenotype to gain insight into how these systems function and show that large changes in the bacterial genome can occur. For example, targeting of a chromosomal pathogenicity island, important for virulence of the potato pathogen Pectobacterium atrosepticum, resulted in deletion of the island, which constituted ∼2% of the bacterial genome. These results have broad significance for the role of CRISPR/Cas systems and their impact on the evolution of bacterial genomes and virulence. In addition, this study demonstrates their potential as a tool for the targeted deletion of specific regions of bacterial chromosomes.
CRISPR-Cas systems are one of the most widespread phage resistance mechanisms in prokaryotes. Our lab recently identified the first examples of phage-borne anti-CRISPR genes that encode protein inhibitors of the type I-F CRISPR-Cas system of Pseudomonas aeruginosa. A key question arising from this work was whether there are other types of anti-CRISPR genes. In the current work, we address this question by demonstrating that some of the same phages carrying type I-F anti-CRISPR genes also possess genes that mediate inhibition of the type I-E CRISPR-Cas system of P. aeruginosa. We have discovered four distinct families of these type I-E anti-CRISPR genes. These genes do not inhibit the type I-F CRISPR-Cas system of P. aeruginosa or the type I-E system of Escherichia coli. Type I-E and I-F anti-CRISPR genes are located at the same position in the genomes of a large group of related P. aeruginosa phages, yet they are found in a variety of combinations and arrangements. We have also identified functional anti-CRISPR genes within nonprophage Pseudomonas genomic regions that are likely mobile genetic elements. This work emphasizes the potential importance of anti-CRISPR genes in phage evolution and lateral gene transfer and supports the hypothesis that more undiscovered families of anti-CRISPR genes exist. Finally, we provide the first demonstration that the type I-E CRISPR-Cas system of P. aeruginosa is naturally active without genetic manipulation, which contrasts with E. coli and other previously characterized I-E systems.
The CRISPR-Cas system is an adaptive immune system possessed by the majority of prokaryotic organisms to combat potentially harmful foreign genetic elements. This study reports the discovery of bacteriophage-encoded anti-CRISPR genes that mediate inhibition of a well-studied subtype of CRISPR-Cas system. The four families of anti-CRISPR genes described here, which comprise only the second group of anti-CRISPR genes to be identified, encode small proteins that bear no sequence similarity to previously studied phage or bacterial proteins. Anti-CRISPR genes represent a newly discovered and intriguing facet of the ongoing evolutionary competition between phages and their bacterial hosts.
Discriminating self and non-self is a universal requirement of immune systems. Adaptive immune systems in prokaryotes are centered around repetitive loci called CRISPRs (clustered regularly interspaced short palindromic repeat), into which invader DNA fragments are incorporated. CRISPR transcripts are processed into small RNAs that guide CRISPR-associated (Cas) proteins to invading nucleic acids by complementary base pairing. However, to avoid autoimmunity it is essential that these RNA-guides exclusively target invading DNA and not complementary DNA sequences (i.e., self-sequences) located in the host's own CRISPR locus. Previous work on the Type III-A CRISPR system from Staphylococcus epidermidis has demonstrated that a portion of the CRISPR RNA-guide sequence is involved in self versus non-self discrimination. This self-avoidance mechanism relies on sensing base pairing between the RNA-guide and sequences flanking the target DNA. To determine if the RNA-guide participates in self versus non-self discrimination in the Type I-E system from Escherichia coli we altered base pairing potential between the RNA-guide and the flanks of DNA targets. Here we demonstrate that Type I-E systems discriminate self from non-self through a base pairing-independent mechanism that strictly relies on the recognition of four unchangeable PAM sequences. In addition, this work reveals that the first base pair between the guide RNA and the PAM nucleotide immediately flanking the target sequence can be disrupted without affecting the interference phenotype. Remarkably, this indicates that base pairing at this position is not involved in foreign DNA recognition. Results in this paper reveal that the Type I-E mechanism of avoiding self sequences and preventing autoimmunity is fundamentally different from that employed by Type III-A systems. We propose the exclusive targeting of PAM-flanked sequences to be termed a target versus non-target discrimination mechanism.
CRISPR loci and their associated genes form a diverse set of adaptive immune systems that are widespread among prokaryotes. In these systems, the CRISPR-associated genes (cas) encode for proteins that capture fragments of invading DNA and integrate these sequences between repeat sequences of the host's CRISPR locus. This information is used upon re-infection to degrade invader genomes. Storing invader sequences in host genomes necessitates a mechanism to differentiate between invader sequences on invader genomes and invader sequences on the host genome. CRISPR-Cas of Staphylococcus epidermidis (Type III-A system) is inhibited when invader sequences are flanked by repeat sequences, and this prevents targeting of the CRISPR locus on the host genome. Here we demonstrate that Escherichia coli CRISPR-Cas (Type I-E system) is not inhibited by repeat sequences. Instead, this system is specifically activated by the presence of bona fide Protospacer Adjacent Motifs (PAMs) in the target. PAMs are conserved sequences adjoining invader sequences on the invader genome, and these sequences are never adjacent to invader sequences within host CRISPR loci. PAM recognition is not affected by base pairing potential of the target with the crRNA. As such, the Type I-E system lacks the ability to specifically recognize self DNA.
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), together with associated genes (cas), form the CRISPR–cas adaptive immune system, which can provide resistance to viruses and plasmids in bacteria and archaea. Here, we use mathematical models, population dynamic experiments, and DNA sequence analyses to investigate the host–phage interactions in a model CRISPR–cas system, Streptococcus thermophilus DGCC7710 and its virulent phage 2972. At the molecular level, the bacteriophage-immune mutant bacteria (BIMs) and CRISPR–escape mutant phage (CEMs) obtained in this study are consistent with those anticipated from an iterative model of this adaptive immune system: resistance by the addition of novel spacers and phage evasion of resistance by mutation in matching sequences or flanking motifs. While CRISPR BIMs were readily isolated and CEMs generated at high rates (frequencies in excess of 10−6), our population studies indicate that there is more to the dynamics of phage–host interactions and the establishment of a BIM–CEM arms race than predicted from existing assumptions about phage infection and CRISPR–cas immunity. Among the unanticipated observations are: (i) the invasion of phage into populations of BIMs resistant by the acquisition of one (but not two) spacers, (ii) the survival of sensitive bacteria despite the presence of high densities of phage, and (iii) the maintenance of phage-limited communities due to the failure of even two-spacer BIMs to become established in populations with wild-type bacteria and phage. We attribute (i) to incomplete resistance of single-spacer BIMs. Based on the results of additional modeling and experiments, we postulate that (ii) and (iii) can be attributed to the phage infection-associated production of enzymes or other compounds that induce phenotypic phage resistance in sensitive bacteria and kill resistant BIMs. We present evidence in support of these hypotheses and discuss the implications of these results for the ecology and (co)evolution of bacteria and phage.
The evidence that the CRISPR regions of the genomes of archaea and bacteria play a role in the ecology and (co)evolution of these microbes and their viruses is overwhelming: (i) the spacers (variable sequences of 26–72 bp of DNA between the repeats of this region) of these prokaryotes are homologous to the DNA of viruses in their communities; (ii) experimentally, the acquisition and incorporation of spacers of viral DNA can protect these organisms from subsequent infection by these viruses; (iii) experimentally, viruses evade this immunity by mutation in homologous protospacers or protospacer-adjacent motifs (PAMs). Not so clear are the nature and magnitude of the role CRISPR plays in this ecology and evolution. Here, we use mathematical models, experiments with Streptococcus thermophilus and the phage 2972, and DNA sequence analyses to explore the contribution of CRISPR–cas immunity to the ecology and (co)evolution of bacteria and their viruses. The results of this study suggest that the contribution of CRISPR to the ecology of bacteria and phage is more modest and limited, and the conditions for a CRISPR–mediated coevolutionary arms race between these organisms more restrictive, than anticipated from models based on the canonical view of phage infection and CRISPR–cas immunity.
Clostridium difficile is an important human-pathogenic bacterium causing antibiotic-associated nosocomial infections worldwide. Mobile genetic elements and bacteriophages have helped shape C. difficile genome evolution. In many bacteria, phage infection may be controlled by a form of bacterial immunity called the clustered regularly interspaced short palindromic repeats/CRISPR-associated (CRISPR/Cas) system. This uses acquired short nucleotide sequences (spacers) to target homologous sequences (protospacers) in phage genomes. C. difficile carries multiple CRISPR arrays, and in this paper we examine the relationships between the host- and phage-carried elements of the system. We detected multiple matches between spacers and regions in 31 C. difficile phage and prophage genomes. A subset of the spacers was located in prophage-carried CRISPR arrays. The CRISPR spacer profiles generated suggest that related phages would have similar host ranges. Furthermore, we show that C. difficile strains of the same ribotype could either have similar or divergent CRISPR contents. Both synonymous and nonsynonymous mutations in the protospacer sequences were identified, as well as differences in the protospacer adjacent motif (PAM), which could explain how phages escape this system. This paper illustrates how the distribution and diversity of CRISPR spacers in C. difficile, and its prophages, could modulate phage predation for this pathogen and impact upon its evolution and pathogenicity.
Clostridium difficile is a significant bacterial human pathogen which undergoes continual genome evolution, resulting in the emergence of new virulent strains. Phages are major facilitators of genome evolution in other bacterial species, and we use sequence analysis-based approaches in order to examine whether the CRISPR/Cas system could control these interactions across divergent C. difficile strains. The presence of spacer sequences in prophages that are homologous to phage genomes raises an extra level of complexity in this predator-prey microbial system. Our results demonstrate that the impact of phage infection in this system is widespread and that the CRISPR/Cas system is likely to be an important aspect of the evolutionary dynamics in C. difficile.
Clustered, Regularly Interspaced Short Palindromic Repeats (CRISPR) abound in the genomes of almost all archaebacteria and nearly half the eubacteria sequenced. Through a genetic interference mechanism, bacteria with CRISPR regions carrying copies of the DNA of previously encountered phage and plasmids abort the replication of phage and plasmids with these sequences. Thus it would seem that protection against infecting phage and plasmids is the selection pressure responsible for establishing and maintaining CRISPR in bacterial populations. But is it? To address this question and provide a framework and hypotheses for the experimental study of the ecology and evolution of CRISPR, I use mathematical models of the population dynamics of CRISPR-encoding bacteria with lytic phage and conjugative plasmids. The results of the numerical (computer simulation) analysis of the properties of these models with parameters in the ranges estimated for Escherichia coli and its phage and conjugative plasmids indicate: (1) In the presence of lytic phage there are broad conditions where bacteria with CRISPR-mediated immunity will have an advantage in competition with non-CRISPR bacteria with otherwise higher Malthusian fitness. (2) These conditions for the existence of CRISPR are narrower when there is envelope resistance to the phage. (3) While there are situations where CRISPR-mediated immunity can provide bacteria an advantage in competition with higher Malthusian fitness bacteria bearing deleterious conjugative plasmids, the conditions for this to obtain are relatively narrow and the intensity of selection favoring CRISPR weak. The parameters of these models can be independently estimated, the assumption behind their construction validated, and the hypotheses generated from the analysis of their properties tested in experimental populations of bacteria with lytic phage and conjugative plasmids. I suggest protocols for estimating these parameters and outline the design of experiments to evaluate the validity of these models and test these hypotheses.
CRISPR is the acronym for the adaptive immune system that has been found in almost all archaebacteria and nearly half the eubacteria examined. Unlike the other defenses bacteria have for protection from phage and other deleterious DNAs, CRISPR has the virtues of specificity, memory, and the capacity to abort infections with a virtually indefinite diversity of deleterious DNAs. In this report, mathematical models of the population dynamics of bacteria, phage, and plasmids are used to determine the conditions under which CRISPR can become established and will be maintained in bacterial populations and the contribution of this adaptive immune system to the ecology and (co)evolution of bacteria and bacteriophage. The models predict realistic and broad conditions under which bacteria bearing CRISPR regions can invade and be maintained in populations of higher fitness bacteria confronted with bacteriophage and narrower conditions when the confrontation is with competitors carrying conjugative plasmids. The models predict that CRISPR can facilitate long-term co-evolutionary arms races between phage and bacteria and between phage- rather than resource-limited bacterial communities. The parameters of these models can be independently estimated, the assumptions behind their construction validated, and the hypotheses generated from the analysis of their properties tested with experimental populations of bacteria.
Clustered regularly interspaced short palindromic repeats (CRISPR) in combination with associated sequences (cas) constitute the CRISPR-Cas immune system, which uptakes DNA from invasive genetic elements as novel “spacers” that provide a genetic record of immunization events. We investigated the potential of CRISPR-based genotyping of Lactobacillus buchneri, a species relevant for commercial silage, bioethanol, and vegetable fermentations. Upon investigating the occurrence and diversity of CRISPR-Cas systems in Lactobacillus buchneri genomes, we observed a ubiquitous occurrence of CRISPR arrays containing a 36-nucleotide (nt) type II-A CRISPR locus adjacent to four cas genes, including the universal cas1 and cas2 genes and the type II signature gene cas9. Comparative analysis of CRISPR spacer content in 26 L. buchneri pickle fermentation isolates associated with spoilage revealed 10 unique locus genotypes that contained between 9 and 29 variable spacers. We observed a set of conserved spacers at the ancestral end, reflecting a common origin, as well as leader-end polymorphisms, reflecting recent divergence. Some of these spacers showed perfect identity with phage sequences, and many spacers showed homology to Lactobacillus plasmid sequences. Following a comparative analysis of sequences immediately flanking protospacers that matched CRISPR spacers, we identified a novel putative protospacer-adjacent motif (PAM), 5′-AAAA-3′. Overall, these findings suggest that type II-A CRISPR-Cas systems are valuable for genotyping of L. buchneri.
We determined the genetic maps of the megaplasmids of six neutoroxigenic Clostridium butyricum type E strains from Italy using molecular and bioinformatics techniques. The megaplasmids are circular, not linear as we had previously proposed. The differently-sized megaplasmids share a genetic region that includes structural, metabolic and regulatory genes. In addition, we found that a 168 kb genetic region is present only in the larger megaplasmids of two tested strains, whereas it is absent from the smaller megaplasmids of the four remaining strains. The genetic region unique to the larger megaplasmids contains, among other features, a locus for clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR associated (cas) genes, i.e. a bacterial adaptive immune system providing sequence-specific protection from invading genetic elements. Some CRISPR spacer sequences of the neurotoxigenic C. butyricum type E strains showed homology to prophage, phage and plasmid sequences from closely related clostridia species or from distant species, all sharing the intestinal habitat, suggesting that the CRISPR locus might be involved in the microorganism adaptation to the human or animal intestinal environment. Besides, we report here that each of four distinct CRISPR spacers partially matched DNA sequences of different prophages and phages, at identical nucleotide locations. This suggests that, at least in neurotoxigenic C. butyricum type E, the CRISPR locus is potentially able to recognize the same conserved DNA sequence of different invading genetic elements, besides targeting sequences unique to previously encountered invading DNA, as currently predicted for a CRISPR locus. Thus, the results of this study introduce the possibility that CRISPR loci can provide resistance to a wider range of invading DNA elements than previously appreciated. Whether it is more advantageous for the peculiar neurotoxigenic C. butyricum type E strains to maintain or to lose the CRISPR-cas system remains an open question.
Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated (cas) genes constitute the CRISPR-Cas systems found in the Bacteria and Archaea domains. At least in some strains they provide an efficient barrier against transmissible genetic elements such as plasmids and viruses. Two CRISPR-Cas systems have been identified in Escherichia coli, pertaining to subtypes I-E (cas-E genes) and I-F (cas-F genes), respectively. In order to unveil the evolutionary dynamics of such systems, we analyzed the sequence variations in the CRISPR-Cas loci of a collection of 131 E. coli strains. Our results show that the strain grouping inferred from these CRISPR data slightly differs from the phylogeny of the species, suggesting the occurrence of recombinational events between CRISPR arrays. Moreover, we determined that the primary cas-E genes of E. coli were altogether replaced with a substantially different variant in a minor group of strains that include K-12. Insertion elements play an important role in this variability. This result underlines the interchange capacity of CRISPR-Cas constituents and hints that at least some functional aspects documented for the K-12 system may not apply to the vast majority of E. coli strains.
Escherichia coli is a model microorganism for the study of diverse aspects such as microbial evolution and is a component of the human gut flora that may have a direct impact in everyday life. This work was undertaken with the purpose of elucidating the evolutionary pathways that have led to the present situation of its significantly different CRISPR-Cas subtypes (I-E and I-F) in several strains of E. coli. In doing so, this information offers a novel and wider understanding of the variety and relevance of these regions within the species. Therefore, this knowledge may provide clues helping researchers better understand these systems for typing purposes and make predictions of their behavior in strains that, depending on their particular genetic dotation, would result in different levels of immunity to foreign genetic elements.
All archaeal and many bacterial genomes contain Clustered Regularly Interspaced Short Palindrome Repeats (CRISPR) and variable arrays of the CRISPR-associated (cas) genes that have been previously implicated in a novel form of DNA repair on the basis of comparative analysis of their protein product sequences. However, the proximity of CRISPR and cas genes strongly suggests that they have related functions which is hard to reconcile with the repair hypothesis.
The protein sequences of the numerous cas gene products were classified into ~25 distinct protein families; several new functional and structural predictions are described. Comparative-genomic analysis of CRISPR and cas genes leads to the hypothesis that the CRISPR-Cas system (CASS) is a mechanism of defense against invading phages and plasmids that functions analogously to the eukaryotic RNA interference (RNAi) systems. Specific functional analogies are drawn between several components of CASS and proteins involved in eukaryotic RNAi, including the double-stranded RNA-specific helicase-nuclease (dicer), the endonuclease cleaving target mRNAs (slicer), and the RNA-dependent RNA polymerase. However, none of the CASS components is orthologous to its apparent eukaryotic functional counterpart. It is proposed that unique inserts of CRISPR, some of which are homologous to fragments of bacteriophage and plasmid genes, function as prokaryotic siRNAs (psiRNA), by base-pairing with the target mRNAs and promoting their degradation or translation shutdown. Specific hypothetical schemes are developed for the functioning of the predicted prokaryotic siRNA system and for the formation of new CRISPR units with unique inserts encoding psiRNA conferring immunity to the respective newly encountered phages or plasmids. The unique inserts in CRISPR show virtually no similarity even between closely related bacterial strains which suggests their rapid turnover, on evolutionary scale. Corollaries of this finding are that, even among closely related prokaryotes, the most commonly encountered phages and plasmids are different and/or that the dominant phages and plasmids turn over rapidly.
We proposed previously that Cas proteins comprise a novel DNA repair system. The association of the cas genes with CRISPR and, especially, the presence, in CRISPR units, of unique inserts homologous to phage and plasmid genes make us abandon this hypothesis. It appears most likely that CASS is a prokaryotic system of defense against phages and plasmids that functions via the RNAi mechanism. The functioning of this system seems to involve integration of fragments of foreign genes into archaeal and bacterial chromosomes yielding heritable immunity to the respective agents. However, it appears that this inheritance is extremely unstable on the evolutionary scale such that the repertoires of unique psiRNAs are completely replaced even in closely related prokaryotes, presumably, in response to rapidly changing repertoires of dominant phages and plasmids.
This article was reviewed by: Eric Bapteste, Patrick Forterre, and Martijn Huynen.
Open peer review
Reviewed by Eric Bapteste, Patrick Forterre, and Martijn Huynen.
For the full reviews, please go to the Reviewers' comments section.
CRISPR/Cas is a widespread adaptive immune system in prokaryotes. This system integrates short stretches of DNA derived from invading nucleic acids into genomic CRISPR loci, which function as memory of previously encountered invaders. In Escherichia coli, transcripts of these loci are cleaved into small RNAs and utilized by the Cascade complex to bind invader DNA, which is then likely degraded by Cas3 during CRISPR interference.
We describe how a CRISPR-activated E. coli K12 is cured from a high copy number plasmid under non-selective conditions in a CRISPR-mediated way. Cured clones integrated at least one up to five anti-plasmid spacers in genomic CRISPR loci. New spacers are integrated directly downstream of the leader sequence. The spacers are non-randomly selected to target protospacers with an AAG protospacer adjacent motif, which is located directly upstream of the protospacer. A co-occurrence of PAM deviations and CRISPR repeat mutations was observed, indicating that one nucleotide from the PAM is incorporated as the last nucleotide of the repeat during integration of a new spacer. When multiple spacers were integrated in a single clone, all spacer targeted the same strand of the plasmid, implying that CRISPR interference caused by the first integrated spacer directs subsequent spacer acquisition events in a strand specific manner.
The E. coli Type I-E CRISPR/Cas system provides resistance against bacteriophage infection, but also enables removal of residing plasmids. We established that there is a positive feedback loop between active spacers in a cluster – in our case the first acquired spacer - and spacers acquired thereafter, possibly through the use of specific DNA degradation products of the CRISPR interference machinery by the CRISPR adaptation machinery. This loop enables a rapid expansion of the spacer repertoire against an actively present DNA element that is already targeted, amplifying the CRISPR interference effect.
Streptococcus pyogenes, one of the major human pathogens, is a unique species since it has acquired diverse strain-specific virulence properties mainly through the acquisition of streptococcal prophages. In addition, S. pyogenes possesses clustered regularly interspaced short palindromic repeats (CRISPR)/Cas systems that can restrict horizontal gene transfer (HGT) including phage insertion. Therefore, it was of interest to examine the relationship between CRISPR and acquisition of prophages in S. pyogenes. Although two distinct CRISPR loci were found in S. pyogenes, some strains lacked CRISPR and these strains possess significantly more prophages than CRISPR harboring strains. We also found that the number of spacers of S. pyogenes CRISPR was less than for other streptococci. The demonstrated spacer contents, however, suggested that the CRISPR appear to limit phage insertions. In addition, we found a significant inverse correlation between the number of spacers and prophages in S. pyogenes. It was therefore suggested that S. pyogenes CRISPR have permitted phage insertion by lacking its own spacers. Interestingly, in two closely related S. pyogenes strains (SSI-1 and MGAS315), CRISPR activity appeared to be impaired following the insertion of phage genomes into the repeat sequences. Detailed analysis of this prophage insertion site suggested that MGAS315 is the ancestral strain of SSI-1. As a result of analysis of 35 additional streptococcal genomes, it was suggested that the influences of the CRISPR on the phage insertion vary among species even within the same genus. Our results suggested that limitations in CRISPR content could explain the characteristic acquisition of prophages and might contribute to strain-specific pathogenesis in S. pyogenes.
Studies of the Escherichia, Neisseria, Thermotoga, and Mycobacteria clustered regularly interspaced short palindromic repeat (CRISPR) subtypes have resulted in a model whereby CRISPRs function as a defense system against bacteriophage infection and conjugative plasmid transfer. In contrast, we previously showed that the Yersinia-subtype CRISPR region of Pseudomonas aeruginosa strain UCBPP-PA14 plays no detectable role in viral immunity but instead is required for bacteriophage DMS3-dependent inhibition of biofilm formation by P. aeruginosa. The goal of this study is to define the components of the Yersinia-subtype CRISPR region required to mediate this bacteriophage-host interaction. We show that the Yersinia-subtype-specific CRISPR-associated (Cas) proteins Csy4 and Csy2 are essential for small CRISPR RNA (crRNA) production in vivo, while the Csy1 and Csy3 proteins are not absolutely required for production of these small RNAs. Further, we present evidence that the core Cas protein Cas3 functions downstream of small crRNA production and that this protein requires functional HD (predicted phosphohydrolase) and DEXD/H (predicted helicase) domains to suppress biofilm formation in DMS3 lysogens. We also determined that only spacer 1, which is not identical to any region of the DMS3 genome, mediates the CRISPR-dependent loss of biofilm formation. Our evidence suggests that gene 42 of phage DMS3 (DMS3-42) is targeted by CRISPR2 spacer 1 and that this targeting tolerates multiple point mutations between the spacer and DMS3-42 target sequence. This work demonstrates how the interaction between P. aeruginosa strain UCBPP-PA14 and bacteriophage DMS3 can be used to further our understanding of the diverse roles of CRISPR system function in bacteria.
Lactococcus lactis is a biotechnological workhorse for food fermentations and potentially therapeutic products and is therefore widely consumed by humans. It is predominantly used as a starter microbe for fermented dairy products, and specialized strains have adapted from a plant environment through reductive evolution and horizontal gene transfer as evidenced by the association of adventitious traits with mobile elements. Specifically, L. lactis has armed itself with a myriad of plasmid-encoded bacteriophage defensive systems to protect against viral predation. This known arsenal had not included CRISPR/Cas (clustered regularly interspaced short palindromic repeats/CRISPR-associated proteins), which forms a remarkable microbial immunity system against invading DNA. Although CRISPR/Cas systems are common in the genomes of closely related lactic acid bacteria (LAB), none was identified within the eight published lactococcal genomes. Furthermore, a PCR-based search of the common LAB CRISPR/Cas systems (Types I and II) in 383 industrial L. lactis strains proved unsuccessful. Here we describe a novel, Type III, self-transmissible, plasmid-encoded, phage-interfering CRISPR/Cas discovered in L. lactis. The native CRISPR spacers confer resistance based on sequence identity to corresponding lactococcal phage. The interference is directed at phages problematic to the dairy industry, indicative of a responsive system. Moreover, targeting could be modified by engineering the spacer content. The 62.8-kb plasmid was shown to be conjugally transferrable to various strains. Its mobility should facilitate dissemination within microbial communities and provide a readily applicable system to naturally introduce CRISPR/Cas to industrially relevant strains for enhanced phage resistance and prevention against acquisition of undesirable genes.
Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities.