Bacteria and archaea develop immunity against invading genomes by incorporating pieces of the invaders' sequences, called spacers, into a clustered regularly interspaced short palindromic repeats (CRISPR) locus between repeats, forming arrays of repeat-spacer units. When spacers are expressed, they direct CRISPR-associated (Cas) proteins to silence complementary invading DNA. In order to characterize the invaders of human microbiomes, we use spacers from CRISPR arrays that we had previously assembled from shotgun metagenomic datasets, and identify contigs that contain these spacers' targets.
We discover 95,000 contigs that are putative invasive mobile genetic elements, some targeted by hundreds of CRISPR spacers. We find that oral sites in healthy human populations have a much greater variety of mobile genetic elements than stool samples. Mobile genetic elements carry genes encoding diverse functions: only 7% of the mobile genetic elements are similar to known phages or plasmids, although a much greater proportion contain phage- or plasmid-related genes. A small number of contigs share similarity with known integrative and conjugative elements, providing the first examples of CRISPR defenses against this class of element. We provide detailed analyses of a few large mobile genetic elements of various types, and a relative abundance analysis of mobile genetic elements and putative hosts, exploring the dynamic activities of mobile genetic elements in human microbiomes. A joint analysis of mobile genetic elements and CRISPRs shows that protospacer-adjacent motifs drive their interaction network; however, some CRISPR-Cas systems target mobile genetic elements lacking motifs.
We identify a large collection of invasive mobile genetic elements in human microbiomes, an important resource for further study of the interaction between the CRISPR-Cas immune system and invaders.
CRISPR-Cas system; human microbiome; mobile genetic element (MGE)
Well-studied innate immune systems exist throughout bacteria and archaea, but a more recently discovered genomic locus may offer prokaryotes surprising immunological adaptability. Mediated by a cassette-like genomic locus termed Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), the microbial adaptive immune system differs from its eukaryotic immune analogues by incorporating new immunities unidirectionally. CRISPR thus stores genomically recoverable timelines of virus-host coevolution in natural organisms refractory to laboratory cultivation. Here we combined a population genetic mathematical model of CRISPR-virus coevolution with six years of metagenomic sequencing to link the recoverable genomic dynamics of CRISPR loci to the unknown population dynamics of virus and host in natural communities. Metagenomic reconstructions in an acid-mine drainage system document CRISPR loci conserving ancestral immune elements to the base-pair across thousands of microbial generations. This ‘trailer-end conservation’ occurs despite rapid viral mutation and despite rapid prokaryotic genomic deletion. The trailer-ends of many reconstructed CRISPR loci are also largely identical across a population. ‘Trailer-end clonality’ occurs despite predictions of host immunological diversity due to negative frequency dependent selection (kill the winner dynamics). Statistical clustering and model simulations explain this lack of diversity by capturing rapid selective sweeps by highly immune CRISPR lineages. Potentially explaining ‘trailer-end conservation,’ we record the first example of a viral bloom overwhelming a CRISPR system. The polyclonal viruses bloom even though they share sequences previously targeted by host CRISPR loci. Simulations show how increasing random genomic deletions in CRISPR loci purges immunological controls on long-lived viral sequences, allowing polyclonal viruses to bloom and depressing host fitness. Our results thus link documented patterns of genomic conservation in CRISPR loci to an evolutionary advantage against persistent viruses. By maintaining old immunities, selection may be tuning CRISPR-mediated immunity against viruses reemerging from lysogeny or migration.
Most microbes appear unculturable in the laboratory, limiting our knowledge of how virus and prokaryotic host evolve in natural systems. However, a genomic locus found in many prokaryotes, CRISPR, may offer cultivation-independent probes of virus-microbe coevolution. Utilizing nearby genes, CRISPR can serially incorporate short viral and plasmid sequences. These sequences bind and cleave cognate regions in subsequent viral and plasmid insertions, conferring adaptive anti-viral and anti-plasmid immunity. By incorporating sequences undirectionally, CRISPR also provides timelines of virus-prokaryote coevolution. Yet, CRISPR only incorporates 30–80 base-pair viral sequences, leaving incomplete coevolutionary recordings. To reconstruct the missing coevolutionary dynamics shaping natural CRISPRs, we combined metagenomic reconstructions with population-scale mathematical modeling. Capturing rare and rapid sweeps of CRISPR diversity by highly immune lines, mathematical modeling explains why naturally reconstructed CRISPR loci are often largely identical across a population. Both model and experiment further document surprising proliferations of old viral sequences against which hosts had preexisting CRISPR immunity. Due to these deadly blooms of ancestral viral elements, CRISPR's conservation of old immune sequences appears to confer a selective advantage. This may explain the striking immunological memory documented in CRISPR loci, which occurs despite rapid viral mutation and despite rapid deletions in prokaryotic genomes.
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a prokaryotic adaptive defence system that provides resistance against alien replicons such as viruses and plasmids. Spacers in a CRISPR cassette confer immunity against viruses and plasmids containing regions complementary to the spacers and hence they retain a footprint of interactions between prokaryotes and their viruses in individual strains and ecosystems. The human gut is a rich habitat populated by numerous microorganisms, but a large fraction of these are unculturable and little is known about them in general and their CRISPR systems in particular.
We used human gut metagenomic data from three open projects in order to characterize the composition and dynamics of CRISPR cassettes in the human-associated microbiota. Applying available CRISPR-identification algorithms and a previously designed filtering procedure to the assembled human gut metagenomic contigs, we found 388 CRISPR cassettes, 373 of which had repeats not observed previously in complete genomes or other datasets. Only 171 of 3,545 identified spacers were coupled with protospacers from the human gut metagenomic contigs. The number of matches to GenBank sequences was negligible, providing protospacers for 26 spacers.
Reconstruction of CRISPR cassettes allowed us to track the dynamics of spacer content. In agreement with other published observations we show that spacers shared by different cassettes (and hence likely older ones) tend to the trailer ends, whereas spacers with matches in the metagenomes are distributed unevenly across cassettes, demonstrating a preference to form clusters closer to the active end of a CRISPR cassette, adjacent to the leader, and hence suggesting dynamical interactions between prokaryotes and viruses in the human gut. Remarkably, spacers match protospacers in the metagenome of the same individual with frequency comparable to a random control, but may match protospacers from metagenomes of other individuals.
The analysis of assembled contigs is complementary to the approach based on the analysis of original reads and hence provides additional data about composition and evolution of CRISPR cassettes, revealing the dynamics of CRISPR-phage interactions in metagenomes.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-202) contains supplementary material, which is available to authorized users.
CRISPR; Human gut; Microbiome
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) are active in acquired resistance against bacteriophage and plasmids in a number of environments. In the human mouth, CRISPR loci evolve to counteract oral phage, but the expression of these CRISPR loci has not previously been investigated. We sequenced cDNA from CRISPR loci found in numerous different oral bacteria and compared with oral phage communities to determine whether the transcription of CRISPR loci is specifically targeted towards highly abundant phage present in the oral environment.
We found that of the 529,027 CRISPR spacer groups studied, 88 % could be identified in transcripts, indicating that the vast majority of CRISPR loci in the oral cavity were transcribed. There were no strong associations between CRISPR spacer repertoires and oral health status or nucleic acid type. We also compared CRISPR repertoires with oral bacteriophage communities, and found that there was no significant association between CRISPR transcripts and oral phage, regardless of the CRISPR type being evaluated. We characterized highly expressed CRISPR spacers and found that they were no more likely than other spacers to match oral phage. By reassembling the CRISPR-bearing reads into longer CRISPR loci, we found that the majority of the loci did not have spacers matching viruses found in the oral cavities of the subjects studied. For some CRISPR types, loci containing spacers matching oral phage were significantly more likely to have multiple spacers rather than a single spacer matching oral phage.
These data suggest that the transcription of oral CRISPR loci is relatively ubiquitous and that highly expressed CRISPR spacers do not necessarily target the most abundant oral phage.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1615-0) contains supplementary material, which is available to authorized users.
CRISPR; Microbiome; Oral microbiome; Virome; Virus
Clustered, regularly interspaced short palindromic repeats (CRISPR) provide bacteria and archaea with sequence-specific, acquired defense against plasmids and phage. Because mobile elements constitute up to 25% of the genome of multidrug-resistant (MDR) enterococci, it was of interest to examine the codistribution of CRISPR and acquired antibiotic resistance in enterococcal lineages. A database was built from 16 Enterococcus faecalis draft genome sequences to identify commonalities and polymorphisms in the location and content of CRISPR loci. With this data set, we were able to detect identities between CRISPR spacers and sequences from mobile elements, including pheromone-responsive plasmids and phage, suggesting that CRISPR regulates the flux of these elements through the E. faecalis species. Based on conserved locations of CRISPR and CRISPR-cas loci and the discovery of a new CRISPR locus with associated functional genes, CRISPR3-cas, we screened additional E. faecalis strains for CRISPR content, including isolates predating the use of antibiotics. We found a highly significant inverse correlation between the presence of a CRISPR-cas locus and acquired antibiotic resistance in E. faecalis, and examination of an additional eight E. faecium genomes yielded similar results for that species. A mechanism for CRISPR-cas loss in E. faecalis was identified. The inverse relationship between CRISPR-cas and antibiotic resistance suggests that antibiotic use inadvertently selects for enterococcal strains with compromised genome defense.
For many bacteria, including the opportunistically pathogenic enterococci, antibiotic resistance is mediated by acquisition of new DNA and is frequently encoded on mobile DNA elements such as plasmids and transposons. Certain enterococcal lineages have recently emerged that are characterized by abundant mobile DNA, including numerous viruses (phage), and plasmids and transposons encoding multiple antibiotic resistances. These lineages cause hospital infection outbreaks around the world. The striking influx of mobile DNA into these lineages is in contrast to what would be expected if a self (genome)-defense system was present. Clustered, regularly interspaced short palindromic repeat (CRISPR) defense is a recently discovered mechanism of prokaryotic self-defense that provides a type of acquired immunity. Here, we find that antibiotic resistance and possession of complete CRISPR loci are inversely related and that members of recently emerged high-risk enterococcal lineages lack complete CRISPR loci. Our results suggest that antibiotic therapy inadvertently selects for enterococci with compromised genome defense.
The human bacterial pathogen Listeria monocytogenes is emerging as a model organism to study RNA-mediated regulation in pathogenic bacteria. A class of non-coding RNAs called CRISPRs (clustered regularly interspaced short palindromic repeats) has been described to confer bacterial resistance against invading bacteriophages and conjugative plasmids. CRISPR function relies on the activity of CRISPR associated (cas) genes that encode a large family of proteins with nuclease or helicase activities and DNA and RNA binding domains. Here, we characterized a CRISPR element (RliB) that is expressed and processed in the L. monocytogenes strain EGD-e, which is completely devoid of cas genes. Structural probing revealed that RliB has an unexpected secondary structure comprising basepair interactions between the repeats and the adjacent spacers in place of canonical hairpins formed by the palindromic repeats. Moreover, in contrast to other CRISPR-Cas systems identified in Listeria, RliB-CRISPR is ubiquitously present among Listeria genomes at the same genomic locus and is never associated with the cas genes. We showed that RliB-CRISPR is a substrate for the endogenously encoded polynucleotide phosphorylase (PNPase) enzyme. The spacers of the different Listeria RliB-CRISPRs share many sequences with temperate and virulent phages. Furthermore, we show that a cas-less RliB-CRISPR lowers the acquisition frequency of a plasmid carrying the matching protospacer, provided that trans encoded cas genes of a second CRISPR-Cas system are present in the genome. Importantly, we show that PNPase is required for RliB-CRISPR mediated DNA interference. Altogether, our data reveal a yet undescribed CRISPR system whose both processing and activity depend on PNPase, highlighting a new and unexpected function for PNPase in “CRISPRology”.
CRISPR-Cas systems confer to bacteria and archaea an adaptive immunity that protects them against invading bacteriophages and plasmids. In this study, we characterize a CRISPR (RliB-CRISPR) that is present in all L. monocytogenes strains at the same genomic locus but is never associated with a cas operon. It is an unusual CRISPR that, as we demonstrate, has a secondary structure consisting of basepair interactions between the repeat sequence and the adjacent spacer. We show that the RliB-CRISPR is processed by the endogenously encoded polynucleotide phosphorylase enzyme (PNPase). In addition, we show that the RliB-CRISPR system requires PNPase and presence of trans encoded cas genes of a second CRISPR-Cas system, to mediate DNA interference directed against a plasmid carrying a matching protospacer. Altogether, our data reveal a novel type of CRISPR system in bacteria that requires endogenously encoded PNPase enzyme for its processing and interference activity.
Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities.
Bacteria and archaea face continual onslaughts of rapidly diversifying viruses and plasmids. Many prokaryotes maintain adaptive immune systems known as clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated genes (Cas). CRISPR-Cas systems are genomic sensors that serially acquire viral and plasmid DNA fragments (spacers) that are utilized to target and cleave matching viral and plasmid DNA in subsequent genomic invasions, offering critical immunological memory. Only 50% of sequenced bacteria possess CRISPR-Cas immunity, in contrast to over 90% of sequenced archaea. To probe why half of bacteria lack CRISPR-Cas immunity, we combined comparative genomics and mathematical modeling. Analysis of hundreds of diverse prokaryotic genomes shows that CRISPR-Cas systems are substantially more prevalent in thermophiles than in mesophiles. With sequenced bacteria disproportionately mesophilic and sequenced archaea mostly thermophilic, the presence of CRISPR-Cas appears to depend more on environmental temperature than on bacterial-archaeal taxonomy. Mutation rates are typically severalfold higher in mesophilic prokaryotes than in thermophilic prokaryotes. To quantitatively test whether accelerated viral mutation leads microbes to lose CRISPR-Cas systems, we developed a stochastic model of virus-CRISPR coevolution. The model competes CRISPR-Cas-positive (CRISPR-Cas+) prokaryotes against CRISPR-Cas-negative (CRISPR-Cas−) prokaryotes, continually weighing the antiviral benefits conferred by CRISPR-Cas immunity against its fitness costs. Tracking this cost-benefit analysis across parameter space reveals viral mutation rate thresholds beyond which CRISPR-Cas cannot provide sufficient immunity and is purged from host populations. These results offer a simple, testable viral diversity hypothesis to explain why mesophilic bacteria disproportionately lack CRISPR-Cas immunity. More generally, fundamental limits on the adaptability of biological sensors (Lamarckian evolution) are predicted.
A remarkable recent discovery in microbiology is that bacteria and archaea possess systems conferring immunological memory and adaptive immunity. Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated genes (CRISPR-Cas) are genomic sensors that allow prokaryotes to acquire DNA fragments from invading viruses and plasmids. Providing immunological memory, these stored fragments destroy matching DNA in future viral and plasmid invasions. CRISPR-Cas systems also provide adaptive immunity, keeping up with mutating viruses and plasmids by continually acquiring new DNA fragments. Surprisingly, less than 50% of mesophilic bacteria, in contrast to almost 90% of thermophilic bacteria and Archaea, maintain CRISPR-Cas immunity. Using mathematical modeling, we probe this dichotomy, showing how increased viral mutation rates can explain the reduced prevalence of CRISPR-Cas systems in mesophiles. Rapidly mutating viruses outrun CRISPR-Cas immune systems, likely decreasing their prevalence in bacterial populations. Thus, viral adaptability may select against, rather than for, immune adaptability in prokaryotes.
Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated (cas) genes constitute the CRISPR-Cas systems found in the Bacteria and Archaea domains. At least in some strains they provide an efficient barrier against transmissible genetic elements such as plasmids and viruses. Two CRISPR-Cas systems have been identified in Escherichia coli, pertaining to subtypes I-E (cas-E genes) and I-F (cas-F genes), respectively. In order to unveil the evolutionary dynamics of such systems, we analyzed the sequence variations in the CRISPR-Cas loci of a collection of 131 E. coli strains. Our results show that the strain grouping inferred from these CRISPR data slightly differs from the phylogeny of the species, suggesting the occurrence of recombinational events between CRISPR arrays. Moreover, we determined that the primary cas-E genes of E. coli were altogether replaced with a substantially different variant in a minor group of strains that include K-12. Insertion elements play an important role in this variability. This result underlines the interchange capacity of CRISPR-Cas constituents and hints that at least some functional aspects documented for the K-12 system may not apply to the vast majority of E. coli strains.
Escherichia coli is a model microorganism for the study of diverse aspects such as microbial evolution and is a component of the human gut flora that may have a direct impact in everyday life. This work was undertaken with the purpose of elucidating the evolutionary pathways that have led to the present situation of its significantly different CRISPR-Cas subtypes (I-E and I-F) in several strains of E. coli. In doing so, this information offers a novel and wider understanding of the variety and relevance of these regions within the species. Therefore, this knowledge may provide clues helping researchers better understand these systems for typing purposes and make predictions of their behavior in strains that, depending on their particular genetic dotation, would result in different levels of immunity to foreign genetic elements.
Clustered regularly interspaced short palindromic repeats (CRISPR) are hypervariable loci widely distributed in prokaryotes that provide acquired immunity against foreign genetic elements. Here, we characterize a novel Streptococcus thermophilus locus, CRISPR3, and experimentally demonstrate its ability to integrate novel spacers in response to bacteriophage. Also, we analyze CRISPR diversity and activity across three distinct CRISPR loci in several S. thermophilus strains. We show that both CRISPR repeats and cas genes are locus specific and functionally coupled. A total of 124 strains were studied, and 109 unique spacer arrangements were observed across the three CRISPR loci. Overall, 3,626 spacers were analyzed, including 2,829 for CRISPR1 (782 unique), 173 for CRISPR2 (16 unique), and 624 for CRISPR3 (154 unique). Sequence analysis of the spacers revealed homology and identity to phage sequences (77%), plasmid sequences (16%), and S. thermophilus chromosomal sequences (7%). Polymorphisms were observed for the CRISPR repeats, CRISPR spacers, cas genes, CRISPR motif, locus architecture, and specific sequence content. Interestingly, CRISPR loci evolved both via polarized addition of novel spacers after exposure to foreign genetic elements and via internal deletion of spacers. We hypothesize that the level of diversity is correlated with relative CRISPR activity and propose that the activity is highest for CRISPR1, followed by CRISPR3, while CRISPR2 may be degenerate. Globally, the dynamic nature of CRISPR loci might prove valuable for typing and comparative analyses of strains and microbial populations. Also, CRISPRs provide critical insights into the relationships between prokaryotes and their environments, notably the coevolution of host and viral genomes.
Gardnerella vaginalis is identified as the predominant colonist of the vaginal tracts of women diagnosed with bacterial vaginosis (BV). G. vaginalis can be isolated from healthy women, and an asymptomatic BV state is also recognised. The association of G. vaginalis with different clinical phenotypes could be explained by different cytotoxicity of the strains, presumably based on disparate gene content. The contribution of horizontal gene transfer to shaping the genomes of G. vaginalis is acknowledged. The CRISPR loci of the recently discovered CRISPR/Cas microbial defence system provide a historical view of the exposure of prokaryotes to a variety of foreign genetic elements.
The CRISPR/Cas loci were analysed using available sequence data from three G. vaginalis complete genomes and 18 G. vaginalis draft genomes in the NCBI database, as well as PCR amplicons of the genomic DNA of 17 clinical isolates. The cas genes in the CRISPR/Cas loci of G. vaginalis belong to the E. coli subtype. Approximately 20% of the spacers had matches in the GenBank database. Sequence analysis of the CRISPR arrays revealed that nearly half of the spacers matched G. vaginalis chromosomal sequences. The spacers that matched G. vaginalis chromosomal sequences were determined to not be self-targeting and were presumably neither constituents of mobile-element-associated genes nor derived from plasmids/viruses. The protospacers targeted by these spacers displayed conserved protospacer-adjacent motifs.
The CRISPR/Cas system has been identified in about one half of the analysed G. vaginalis strains. Our analysis of CRISPR sequences did not reveal a potential link between their presence and the virulence of the G. vaginalis strains. Based on the origins of the spacers found in the G. vaginalis CRISPR arrays, we hypothesise that the transfer of genetic material among G. vaginalis strains could be regulated by the CRISPR/Cas mechanism. The present study is the first attempt to determine and analyse the CRISPR loci of bacteria isolated from the human vaginal tract.
Gardnerella vaginalis; Bacterial vaginosis; CRISPR/Cas; Spacer; Repeat; PAM
The CRISPR-Cas (Clustered Regularly Interspaced Short Palindrome Repeats – CRISPR associated proteins) system provides adaptive immunity in archaea and bacteria. A hallmark of CRISPR-Cas is the involvement of short crRNAs that guide associated proteins in the destruction of invading DNA or RNA. We present three fundamentally distinct processing pathways in the cyanobacterium Synechocystis sp. PCC6803 for a subtype I-D (CRISPR1), and two type III systems (CRISPR2 and CRISPR3), which are located together on the plasmid pSYSA. Using high-throughput transcriptome analyses and assays of transcript accumulation we found all CRISPR loci to be highly expressed, but the individual crRNAs had profoundly varying abundances despite single transcription start sites for each array. In a computational analysis, CRISPR3 spacers with stable secondary structures displayed a greater ratio of degradation products. These structures might interfere with the loading of the crRNAs into RNP complexes, explaining the varying abundancies. The maturation of CRISPR1 and CRISPR2 transcripts depends on at least two different Cas6 proteins. Mutation of gene sll7090, encoding a Cmr2 protein led to the disappearance of all CRISPR3-derived crRNAs, providing in vivo evidence for a function of Cmr2 in the maturation, regulation of expression, Cmr complex formation or stabilization of CRISPR3 transcripts. Finally, we optimized CRISPR repeat structure prediction and the results indicate that the spacer context can influence individual repeat structures.
All immune systems must distinguish self from non-self to repel invaders without inducing autoimmunity. Clustered, regularly interspaced, short palindromic repeat (CRISPR) loci protect bacteria and archaea from invasion by phage and plasmid DNA through a genetic interference pathway1–9. CRISPR loci are present in ~ 40% and ~90% of sequenced bacterial and archaeal genomes respectively10 and evolve rapidly, acquiring new spacer sequences to adapt to highly dynamic viral populations1, 11–13. Immunity requires a sequence match between the invasive DNA and the spacers that lie between CRISPR repeats1–9. Each cluster is genetically linked to a subset of the cas (CRISPR-associated) genes14–16 that collectively encode >40 families of proteins involved in adaptation and interference. CRISPR loci encode small CRISPR RNAs (crRNAs) that contain a full spacer flanked by partial repeat sequences2, 17–19. CrRNA spacers are thought to identify targets by direct Watson-Crick pairing with invasive “protospacer” DNA2, 3, but how they avoid targeting the spacer DNA within the encoding CRISPR locus itself is unknown. Here we have defined the mechanism of CRISPR self/non-self discrimination. In Staphylococcus epidermidis, target/crRNA mismatches at specific positions outside of the spacer sequence license foreign DNA for interference, whereas extended pairing between crRNA and CRISPR DNA repeats prevents autoimmunity. Hence, this CRISPR system uses the base-pairing potential of crRNAs not only to specify a target but also to spare the bacterial chromosome from interference. Differential complementarity outside of the spacer sequence is a built-in feature of all CRISPR systems, suggesting that this mechanism is a broadly applicable solution to the self/non-self dilemma that confronts all immune pathways.
Clustered regularly interspaced short palindromic repeats (CRISPRs) are a family of DNA direct repeats found in many prokaryotic genomes. Repeats of 21–37 bp typically show weak dyad symmetry and are separated by regularly sized, nonrepetitive spacer sequences. Four CRISPR-associated (Cas) protein families, designated Cas1 to Cas4, are strictly associated with CRISPR elements and always occur near a repeat cluster. Some spacers originate from mobile genetic elements and are thought to confer “immunity” against the elements that harbor these sequences. In the present study, we have systematically investigated uncharacterized proteins encoded in the vicinity of these CRISPRs and found many additional protein families that are strictly associated with CRISPR loci across multiple prokaryotic species. Multiple sequence alignments and hidden Markov models have been built for 45 Cas protein families. These models identify family members with high sensitivity and selectivity and classify key regulators of development, DevR and DevS, in Myxococcus xanthus as Cas proteins. These identifications show that CRISPR/cas gene regions can be quite large, with up to 20 different, tandem-arranged cas genes next to a repeat cluster or filling the region between two repeat clusters. Distinctive subsets of the collection of Cas proteins recur in phylogenetically distant species and correlate with characteristic repeat periodicity. The analyses presented here support initial proposals of mobility of these units, along with the likelihood that loci of different subtypes interact with one another as well as with host cell defensive, replicative, and regulatory systems. It is evident from this analysis that CRISPR/cas loci are larger, more complex, and more heterogeneous than previously appreciated.
The family of clustered regularly interspaced short palindromic repeats (CRISPRs) describes a class of DNA repeats found in nearly half of all bacterial and archaeal genomes. These DNA repeat regions have a remarkably regular structure: unique sequences of constant size, called spacers, sit between each pair of repeats. The DNA repeats do not encode proteins, but appear to be transcribed and processed into small RNAs that may have any number of functions, including resistance to any phage (i.e., virus of bacteria) whose sequence matches a spacer; spacers change rapidly as microbial strains evolve. This work describes 41 new CRISPR-associated (cas) gene families, which are always found near these repeats, in addition to the four previously known. It shows that CRISPR systems belong to different classes, with different repeat patterns, sets of genes, and species ranges. Most of these seem to come and go rather rapidly from their host genomes. These possibly beneficial mobile genetic elements may play an important role in driving prokaryotic evolution.
In prokaryotes, clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated (Cas) proteins constitute a defence system against bacteriophages and plasmids. CRISPR/Cas systems acquire short spacer sequences from foreign genetic elements and incorporate these into their CRISPR arrays, generating a memory of past invaders. Defence is provided by short non-coding RNAs that guide Cas proteins to cleave complementary nucleic acids. While most spacers are acquired from phages and plasmids, there are examples of spacers that match genes elsewhere in the host bacterial chromosome. In Pectobacterium atrosepticum the type I-F CRISPR/Cas system has acquired a self-complementary spacer that perfectly matches a protospacer target in a horizontally acquired island (HAI2) involved in plant pathogenicity. Given the paucity of experimental data about CRISPR/Cas–mediated chromosomal targeting, we examined this process by developing a tightly controlled system. Chromosomal targeting was highly toxic via targeting of DNA and resulted in growth inhibition and cellular filamentation. The toxic phenotype was avoided by mutations in the cas operon, the CRISPR repeats, the protospacer target, and protospacer-adjacent motif (PAM) beside the target. Indeed, the natural self-targeting spacer was non-toxic due to a single nucleotide mutation adjacent to the target in the PAM sequence. Furthermore, we show that chromosomal targeting can result in large-scale genomic alterations, including the remodelling or deletion of entire pre-existing pathogenicity islands. These features can be engineered for the targeted deletion of large regions of bacterial chromosomes. In conclusion, in DNA–targeting CRISPR/Cas systems, chromosomal interference is deleterious by causing DNA damage and providing a strong selective pressure for genome alterations, which may have consequences for bacterial evolution and pathogenicity.
Bacteria have evolved mechanisms that provide protection from continual invasion by viruses and other foreign elements. Resistance systems, known as CRISPR/Cas, were recently discovered and equip bacteria and archaea with an “adaptive immune system.” This adaptive immunity provides a highly evolvable sequence-specific small RNA–based memory of past invasions by viruses and foreign genetic elements. There are many cases where these systems appear to target regions within the bacterial host's own genome (a possible autoimmunity), but the evolutionary rationale for this is unclear. Here, we demonstrate that CRISPR/Cas targeting of the host chromosome is highly toxic but that cells survive through mutations that alleviate the immune mechanism. We have used this phenotype to gain insight into how these systems function and show that large changes in the bacterial genome can occur. For example, targeting of a chromosomal pathogenicity island, important for virulence of the potato pathogen Pectobacterium atrosepticum, resulted in deletion of the island, which constituted ∼2% of the bacterial genome. These results have broad significance for the role of CRISPR/Cas systems and their impact on the evolution of bacterial genomes and virulence. In addition, this study demonstrates their potential as a tool for the targeted deletion of specific regions of bacterial chromosomes.
In order to get further insights into the role of the clustered, regularly interspaced, short palindromic repeats (CRISPRs) in Escherichia coli, we analyzed the CRISPR diversity in a collection of 290 strains, in the phylogenetic framework of the strains represented by multilocus sequence typing (MLST). The set included 263 natural E. coli isolates exposed to various environments and isolated over a 20-year period from humans and animals, as well as 27 fully sequenced strains. Our analyses confirm that there are two largely independent pairs of CRISPR loci (CRISPR1 and -2 and CRISPR3 and -4), each associated with a different type of cas genes (Ecoli and Ypest, respectively), but that each pair of CRISPRs has similar dynamics. Strikingly, the major phylogenetic group B2 is almost devoid of CRISPRs. The majority of genomes analyzed lack Ypest cas genes and contain CRISPR3 with spacers matching Ypest cas genes. The analysis of relatedness between strains in terms of spacer repertoire and the MLST tree shows a pattern where closely related strains (MLST phylogenetic distance of <0.005 corresponding to at least hundreds of thousands of years) often exhibit identical CRISPRs while more distantly related strains (MLST distance of >0.01) exhibit completely different CRISPRs. This suggests rare but radical turnover of spacers in CRISPRs rather than CRISPR gradual change. We found no link between the presence, size, or content of CRISPRs and the lifestyle of the strains. Our data suggest that, within the E. coli species, CRISPRs do not have the expected characteristics of a classical immune system.
Prokaryotes thrive in spite of the vast number and diversity of their viruses. This partly results from the evolution of mechanisms to inactivate or silence the action of exogenous DNA. Among these, Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are unique in providing adaptive immunity against elements with high local resemblance to genomes of previously infecting agents. Here, we analyze the CRISPR loci of 51 complete genomes of Escherichia and Salmonella. CRISPR are in two pairs of loci in Escherichia, one single pair in Salmonella, each pair showing a similar turnover rate, repeat sequence and putative linkage to a common set of cas genes. Yet, phylogeny shows that CRISPR and associated cas genes have different evolutionary histories, the latter being frequently exchanged or lost. In our set, one CRISPR pair seems specialized in plasmids often matching genes coding for the replication, conjugation and antirestriction machinery. Strikingly, this pair also matches the cognate cas genes in which case these genes are absent. The unexpectedly high conservation of this anti-CRISPR suggests selection to counteract the invasion of mobile elements containing functional CRISPR/cas systems. There are few spacers in most CRISPR, which rarely match genomes of known phages. Furthermore, we found that strains divergent less than 250 thousand years ago show virtually identical CRISPR. The lack of congruence between cas, CRISPR and the species phylogeny and the slow pace of CRISPR change make CRISPR poor epidemiological markers in enterobacteria. All these observations are at odds with the expectedly abundant and dynamic repertoire of spacers in an immune system aiming at protecting bacteria from phages. Since we observe purifying selection for the maintenance of CRISPR these results suggest that alternative evolutionary roles for CRISPR remain to be uncovered.
Background: CRISPR/Cas systems allow archaea and bacteria to resist invasion by foreign nucleic acids.
Results: The CRISPR/Cas system in Haloferax recognized six different PAM sequences that could trigger a defense response.
Conclusion: The PAM sequence specificity of the defense response in type I CRISPR systems is more relaxed than previously thought.
Significance: The PAM sequence requirements for interference and adaptation appear to differ markedly.
The clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated (Cas) system provides adaptive and heritable immunity against foreign genetic elements in most archaea and many bacteria. Although this system is widespread and diverse with many subtypes, only a few species have been investigated to elucidate the precise mechanisms for the defense of viruses or plasmids. Approximately 90% of all sequenced archaea encode CRISPR/Cas systems, but their molecular details have so far only been examined in three archaeal species: Sulfolobus solfataricus, Sulfolobus islandicus, and Pyrococcus furiosus. Here, we analyzed the CRISPR/Cas system of Haloferax volcanii using a plasmid-based invader assay. Haloferax encodes a type I-B CRISPR/Cas system with eight Cas proteins and three CRISPR loci for which the identity of protospacer adjacent motifs (PAMs) was unknown until now. We identified six different PAM sequences that are required upstream of the protospacer to permit target DNA recognition. This is only the second archaeon for which PAM sequences have been determined, and the first CRISPR group with such a high number of PAM sequences. Cells could survive the plasmid challenge if their CRISPR/Cas system was altered or defective, e.g. by deletion of the cas gene cassette. Experimental PAM data were supplemented with bioinformatics data on Haloferax and Haloquadratum.
Archaea; Microbiology; RNA; RNA Metabolism; RNA Processing; CRISPR/Cas; Haloferax volcanii; PAM
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), together with associated genes (cas), form the CRISPR–cas adaptive immune system, which can provide resistance to viruses and plasmids in bacteria and archaea. Here, we use mathematical models, population dynamic experiments, and DNA sequence analyses to investigate the host–phage interactions in a model CRISPR–cas system, Streptococcus thermophilus DGCC7710 and its virulent phage 2972. At the molecular level, the bacteriophage-immune mutant bacteria (BIMs) and CRISPR–escape mutant phage (CEMs) obtained in this study are consistent with those anticipated from an iterative model of this adaptive immune system: resistance by the addition of novel spacers and phage evasion of resistance by mutation in matching sequences or flanking motifs. While CRISPR BIMs were readily isolated and CEMs generated at high rates (frequencies in excess of 10−6), our population studies indicate that there is more to the dynamics of phage–host interactions and the establishment of a BIM–CEM arms race than predicted from existing assumptions about phage infection and CRISPR–cas immunity. Among the unanticipated observations are: (i) the invasion of phage into populations of BIMs resistant by the acquisition of one (but not two) spacers, (ii) the survival of sensitive bacteria despite the presence of high densities of phage, and (iii) the maintenance of phage-limited communities due to the failure of even two-spacer BIMs to become established in populations with wild-type bacteria and phage. We attribute (i) to incomplete resistance of single-spacer BIMs. Based on the results of additional modeling and experiments, we postulate that (ii) and (iii) can be attributed to the phage infection-associated production of enzymes or other compounds that induce phenotypic phage resistance in sensitive bacteria and kill resistant BIMs. We present evidence in support of these hypotheses and discuss the implications of these results for the ecology and (co)evolution of bacteria and phage.
The evidence that the CRISPR regions of the genomes of archaea and bacteria play a role in the ecology and (co)evolution of these microbes and their viruses is overwhelming: (i) the spacers (variable sequences of 26–72 bp of DNA between the repeats of this region) of these prokaryotes are homologous to the DNA of viruses in their communities; (ii) experimentally, the acquisition and incorporation of spacers of viral DNA can protect these organisms from subsequent infection by these viruses; (iii) experimentally, viruses evade this immunity by mutation in homologous protospacers or protospacer-adjacent motifs (PAMs). Not so clear are the nature and magnitude of the role CRISPR plays in this ecology and evolution. Here, we use mathematical models, experiments with Streptococcus thermophilus and the phage 2972, and DNA sequence analyses to explore the contribution of CRISPR–cas immunity to the ecology and (co)evolution of bacteria and their viruses. The results of this study suggest that the contribution of CRISPR to the ecology of bacteria and phage is more modest and limited, and the conditions for a CRISPR–mediated coevolutionary arms race between these organisms more restrictive, than anticipated from models based on the canonical view of phage infection and CRISPR–cas immunity.
CRISPR-Cas systems are RNA-based immune systems that protect prokaryotes from invaders such as phages and plasmids. In adaptation, the initial phase of the immune response, short foreign DNA fragments are captured and integrated into host CRISPR loci to provide heritable defense against encountered foreign nucleic acids. Each CRISPR contains a ∼100–500 bp leader element that typically includes a transcription promoter, followed by an array of captured ∼35 bp sequences (spacers) sandwiched between copies of an identical ∼35 bp direct repeat sequence. New spacers are added immediately downstream of the leader. Here, we have analyzed adaptation to phage infection in Streptococcus thermophilus at the CRISPR1 locus to identify cis-acting elements essential for the process. We show that the leader and a single repeat of the CRISPR locus are sufficient for adaptation in this system. Moreover, we identified a leader sequence element capable of stimulating adaptation at a dormant repeat. We found that sequences within 10 bp of the site of integration, in both the leader and repeat of the CRISPR, are required for the process. Our results indicate that information at the CRISPR leader-repeat junction is critical for adaptation in this Type II-A system and likely other CRISPR-Cas systems.
Explorations of human microbiota have provided substantial insight into microbial community composition; however, little is known about interactions between various microbial components in human ecosystems. In response to the powerful impact of viral predation, bacteria have acquired potent defenses, including an adaptive immune response based on the CRISPR/Cas system. To improve our understanding of the interactions between bacteria and their viruses in humans, we analyzed 13,977 streptococcal CRISPR sequences and compared them with 2,588,172 virome reads in the saliva of 4 human subjects over 17 months. We found a diverse array of viruses and CRISPR spacers, many of which were specific to each subject and time point. There were numerous viral sequences matching CRISPR spacers; these matches were highly specific for salivary viruses. We determined that spacers and viruses coexist at the same time, which suggests that streptococcal CRISPR/Cas systems are under constant pressure from salivary viruses. CRISPRs in some subjects were just as likely to match viral sequences from other subjects as they were to match viruses from the same subject. Because interactions between bacteria and viruses help to determine the structure of bacterial communities, CRISPR-virus analyses are likely to provide insight into the forces shaping the human microbiome.
CRISPRs; Saliva; Virus; Virome; Microbiome
The CRISPR (clusters of regularly interspaced short palindromic repeats)–Cas adaptive immune system is an important defense system in bacteria, providing targeted defense against invasions of foreign nucleic acids. CRISPR–Cas systems consist of CRISPR loci and cas (CRISPR-associated) genes: sequence segments of invaders are incorporated into host genomes at CRISPR loci to generate specificity, while adjacent cas genes encode proteins that mediate the defense process. We pursued an integrated approach to identifying putative cas genes from genomes and metagenomes, combining similarity searches with genomic neighborhood analysis. Application of our approach to bacterial genomes and human microbiome datasets allowed us to significantly expand the collection of cas genes: the sequence space of the Cas9 family, the key player in the recently engineered RNA-guided platforms for genome editing in eukaryotes, is expanded by at least two-fold with metagenomic datasets. We found genes in cas loci encoding other functions, for example, toxins and antitoxins, confirming the recently discovered potential of coupling between adaptive immunity and the dormancy/suicide systems. We further identified 24 novel Cas families; one novel family contains 20 proteins, all identified from the human microbiome datasets, illustrating the importance of metagenomics projects in expanding the diversity of cas genes.
CRISPR (Clustered, Regularly, Interspaced, Short, Palindromic Repeats) loci provide prokaryotes with an adaptive immunity against viruses and other mobile genetic elements. CRISPR arrays can be transcribed and processed into small crRNA molecules, which are then used by the cell to target the foreign nucleic acid. Since spacers are accumulated by active CRISPR/Cas systems, the sequences of these spacers provide a record of the past "infection history" of the organism.
Here we analyzed all currently known spacers present in archaeal genomes and identified their source by DNA similarity. While nearly 50% of archaeal spacers matched mobile genetic elements, such as plasmids or viruses, several others matched chromosomal genes of other organisms, primarily other archaea. Thus, networks of gene exchange between archaeal species were revealed by the spacer analysis, including many cases of inter-genus and inter-species gene transfer events. Spacers that recognize viral sequences tend to be located further away from the leader sequence, implying that there exists a selective pressure for their retention.
CRISPR spacers provide direct evidence for extensive gene exchange in archaea, especially within genera, and support the current dogma where the primary role of the CRISPR/Cas system is anti-viral and anti-plasmid defense.
Open peer review
This article was reviewed by: Profs. W. Ford Doolittle, John van der Oost, Christa Schleper (nominated by board member Prof. J Peter Gogarten)
CRISPR; Lateral Gene transfer; Horizontal gene transfer; viruses; archaea; competence
CRISPR-Cas systems (clustered regularly interspaced short palindromic repeats/CRISPR-associated proteins) are found in 90% of archaea and about 40% of bacteria. In this original system, CRISPR arrays comprise short, almost unique sequences called spacers that are interspersed with conserved palindromic repeats. These systems play a role in adaptive immunity and participate to fight non-self DNA such as integrative and conjugative elements, plasmids, and phages. In Streptococcus agalactiae, a bacterium implicated in colonization and infections in humans since the 1960s, two CRISPR-Cas systems have been described. A type II-A system, characterized by proteins Cas9, Cas1, Cas2, and Csn2, is ubiquitous, and a type I–C system, with the Cas8c signature protein, is present in about 20% of the isolates. Unlike type I–C, which appears to be non-functional, type II-A appears fully functional. Here we studied type II-A CRISPR-cas loci from 126 human isolates of S. agalactiae belonging to different clonal complexes that represent the diversity of the species and that have been implicated in colonization or infection. The CRISPR-cas locus was analyzed both at spacer and repeat levels. Major distinctive features were identified according to the phylogenetic lineages previously defined by multilocus sequence typing, especially for the sequence type (ST) 17, which is considered hypervirulent. Among other idiosyncrasies, ST-17 shows a significantly lower number of spacers in comparison with other lineages. This characteristic could reflect the peculiar virulence or colonization specificities of this lineage.
Streptococcus agalactiae; CRISPR-Cas; phylogeny; ST-17; typing
CRISPR-Cas adaptive immunity systems of bacteria and archaea insert fragments of virus or plasmid DNA as spacer sequences into CRISPR repeat loci. Processed transcripts encompassing these spacers guide the cleavage of the cognate foreign DNA or RNA. Most CRISPR-Cas loci, in addition to recognized cas genes, also include genes that are not directly implicated in spacer acquisition, CRISPR transcript processing or interference. Here we comprehensively analyze sequences, structures and genomic neighborhoods of one of the most widespread groups of such genes that encode proteins containing a predicted nucleotide-binding domain with a Rossmann-like fold, which we denote CARF (CRISPR-associated Rossmann fold). Several CARF protein structures have been determined but functional characterization of these proteins is lacking. The CARF domain is most frequently combined with a C-terminal winged helix-turn-helix DNA-binding domain and “effector” domains most of which are predicted to possess DNase or RNase activity. Divergent CARF domains are also found in RtcR proteins, sigma-54 dependent regulators of the rtc RNA repair operon. CARF genes frequently co-occur with those coding for proteins containing the WYL domain with the Sm-like SH3 β-barrel fold, which is also predicted to bind ligands. CRISPR-Cas and possibly other defense systems are predicted to be transcriptionally regulated by multiple ligand-binding proteins containing WYL and CARF domains which sense modified nucleotides and nucleotide derivatives generated during virus infection. We hypothesize that CARF domains also transmit the signal from the bound ligand to the fused effector domains which attack either alien or self nucleic acids, resulting, respectively, in immunity complementing the CRISPR-Cas action or in dormancy/programmed cell death.
CRISPR; Rossmann fold; beta barrel; DNA-binding proteins; phage defense