|Home | About | Journals | Submit | Contact Us | Français|
CRISPR-Cas adaptive immune systems in prokaryotes boast a diversity of protein families and mechanisms of action, where most systems rely on protospacer-adjacent motifs (PAMs) for DNA target recognition. Here, we developed an in vivo, positive, and tunable screen termed PAM-SCANR (PAM screen achieved by NOT-gate repression) to elucidate functional PAMs as well as an interactive visualization scheme termed the PAM wheel to convey individual PAM sequences and their activities. PAM-SCANR and the PAM wheel identified known functional PAMs while revealing complex sequence-activity landscapes for the Bacillus halodurans I-C (Cascade), Escherichia coli I-E (Cascade), Streptococcus thermophilus II-A CRISPR1 (Cas9), and Francisella novicida V-A (Cpf1) systems. The PAM wheel was also readily applicable to existing high-throughput screens and garnered insights into SpyCas9 and SauCas9 PAM diversity. These tools offer powerful means of elucidating and visualizing functional PAMs toward accelerating our ability to understand and exploit the multitude of CRISPR-Cas systems in nature.
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) and their associated Cas (CRISPR associated) proteins form widespread adaptive immune systems in prokaryotes that have been harnessed as ubiquitous tools in biotechnology and medicine (Barrangou et al., 2007; Doudna and Charpentier, 2014; Hsu et al., 2014; van der Oost et al., 2014). Unlike adaptive immune systems in eukaryotes, CRISPR-Cas systems rely on the base-pairing potential of RNA guides for specific target recognition (Brouns et al., 2008; Garneau et al., 2010; Marraffini and Sontheimer, 2008). Guide RNAs encoded within the CRISPR array employ their ~20 – 30 nucleotide spacer region to base pair with complementary protospacer sequences, leading the system’s Cas effector proteins to cleave or degrade these targets (Garneau et al., 2010; Gasiunas et al., 2012; Hale et al., 2009; Jinek et al., 2013; Jore et al., 2011; Westra et al., 2012). CRISPR-Cas systems can also acquire pieces of foreign genetic material as new spacers to confer immunity against future infections. The programmable and multiplexable nature of DNA/RNA targeting has led to a myriad of applications ranging from gene therapy to antimicrobials (Bikard et al., 2014; Citorik et al., 2014; Cong et al., 2013; Gomaa et al., 2014; Hilton et al., 2015; Mali et al., 2013).
CRISPR-Cas systems have proven to be remarkably diverse despite their common role in adaptive immunity. The current classification system defines two classes, six main types, and nineteen subtypes (Makarova et al., 2015; Shmakov et al., 2015). Each grouping is distinguished by its Cas proteins and by the mechanisms of RNA processing, target recognition, and target destruction. Across the six types, Type II systems have garnered the most attention to-date because their machinery can be packaged into a portable two-component system (Jinek et al., 2012). However, Type I systems are the most abundant and widespread in nature and can degrade DNA (Brouns et al., 2008; Makarova et al., 2015). Type III systems can bind and cleave DNA and/or RNA (Hale et al., 2009; Samai et al., 2015; Staals et al., 2014; Tamulaitis et al., 2014), and Type V and putative Type VI systems offer potential alternatives to Type II systems and the Cas9 effector protein (Shmakov et al., 2015; Zetsche et al., 2015). While our understanding and use of CRISPR-Cas systems has centered on a few exemplary model systems, a plethora of other CRISPR-Cas systems remains to be characterized and harnessed.
When characterizing new CRISPR-Cas systems, one of the greatest challenges is elucidating rules for guide RNA design and target selection. Aside from complementarity between the target and the spacer portion of the guide RNA, the protospacer must also be flanked on one side by defined sequences (Deveau et al., 2008). These flanking sequences allow CRISPR-Cas immune systems to differentiate between self (the CRISPR array spacer) and non-self (the invader DNA) and have been heavily implicated in spacer acquisition (Deveau et al., 2008; Heler et al., 2015; Horvath et al., 2008; Marraffini and Sontheimer, 2010; Mojica et al., 2009). For all characterized systems besides Type III systems, this flanking sequence (called the protospacer-adjacent motif or PAM) initially drives DNA interrogation by the Cas effector proteins prior to DNA unwinding and base pairing with the loaded guide RNA (Marraffini and Sontheimer, 2010; Semenova et al., 2011; Sternberg et al., 2014; Westra et al., 2012). In the absence of a PAM, the Cas proteins cannot recognize the target, even if it is perfectly complementary to the spacer. The PAM thus plays a central role in target selection, both in the context of host immunity and CRISPR-based technologies.
The centrality of the PAM has spurred the development of multiple approaches to elucidate functional PAM sequences necessary for DNA recognition for popular CRISPR-Cas systems. Originally, bioinformatics analysis of CRISPR spacers were used to identify matching bacteriophage and plasmid sequences (Horvath et al., 2008; Mojica et al., 2009). This technique can identify a set of putative PAMs, although it remains limited by the availability of matching phage or plasmid DNA sequences in genomic databases, and obtained hits may include mutated escape-PAMs. More recent efforts have developed high-throughput, experimental screens to determine functional PAMs based on the depletion of a target plasmid or on the introduction of a double-stranded break in vitro (Jiang et al., 2013a; Karvelis et al., 2015; Pattanayak et al., 2013). However, plasmid removal screens are based on an irreversible binary event, implicitly measure the frequency of escape from killing, and require high library coverage to quantitatively identify depleted PAM sequences. Separately, in vitro DNA cleavage screens require the purification of active protein-RNA complexes and can be highly sensitive to the assay conditions (Karvelis et al., 2015). Furthermore, in vitro screens that rely on adaptor ligation are incompatible with Type I systems that cleave and degrade DNA (Pattanayak et al., 2013; Zetsche et al., 2015). These shortcomings highlight the need for screens that overcome many of these limitations.
We sought to develop a high-throughput, in vivo screen with two distinct features: applicability across PAM-dependent CRISPR-Cas systems and the generation of a positive signal for functional PAMs (Figure 1A, Movie S1). To develop a broadly applicable screen, we utilized gene repression as the basis for CRISPR function. Prior work demonstrated that Type I and Type II CRISPR-Cas systems lacking endonuclease activity could function as transcriptional regulators, either by removing the Cas3 protein from the Type I-E system in E. coli or by mutating the catalytic residues of the HNH and RuvC endonuclease domains in Type II-A Cas9 proteins (Bikard et al., 2013; Jinek et al., 2012; Luo et al., 2014; Qi et al., 2013; Rath et al., 2014). In either case, the Cas protein(s) tightly bind to but do not cleave the target DNA, thereby interfering with transcription at the bound locus. Separately, the Cpf1 effector protein for the Type V-A system in Francisella novicida could be converted into a DNA-binding protein by mutating the first or second RuvC domain (Zetsche et al., 2015), although Cpf1 has yet to be used for gene regulation. Identifying RuvC and HNH domains or cas3 genes has proven straightforward using bioinformatics analyses, lending to the conversion of uncharacterized CRISPR-Cas systems into gene repressors (Jackson et al., 2014; Jinek et al., 2012).
To associate gene repression with a positive signal, we constructed a simple genetic circuit termed a NOT gate (Figure 1). The NOT gate is based on the CRISPR-Cas system blocking the −35 element within the promoter upstream of lacI, and the LacI repressor blocking the promoter of the green fluorescent protein (GFP). The PAM library is placed upstream of the −35 element to minimally impact the transcription or translation of lacI. By targeting the top or bottom strand of the lacI promoter, PAMs located on the 5’ end (Type I, V, VI systems) or on the 3’ end (Type II systems) of the protospacer can be assessed (Figure S1A) (Horvath et al., 2008; Shmakov et al., 2015; Westra et al., 2013; Zetsche et al., 2015). Based on the configuration of the NOT gate, only a functional PAM would lead to lacI repression and reporter expression. Fluorescent cells can then be isolated through fluorescence-activated cell sorting (FACS). The screen is performed in an E. coli strain stripped of lacI-lacZ and its endogenous CRISPR-Cas system to avoid crosstalk. We termed the resulting screen PAM-SCANR (PAM screen achieved by NOT-gate repression). By allowing the isolation of fluorescent cells, PAM-SCANR affords two modes of screening: comprehensive screening based on next-generation sequencing of pre-sorted and post-sorted PAM libraries, and individual screening based on single sequencing of sorted fluorescent clones (Figure 1B). Furthermore, the stringency of PAM-SCANR can be tuned using intermediate concentrations of Isopropyl β-D-1-thiogalactopyranoside (IPTG) to titrate active LacI repressors within the NOT gate, allowing the detection of weak functional PAMs or the use of poorly expressed or weakly active systems (Figures 1C and S2).
High-throughput PAM screens yield lists of all sequences within the screened library along with their relative enrichment or depletion. Each list is normally conveyed in a compact visualization scheme called a sequence logo that reports the conservation of a given nucleotide at each position (Horvath et al., 2008). While sequence logos have been the standard for conveying functional PAMs (Horvath et al., 2008; Shmakov et al., 2015), they inherently sacrifice two crucial details: the individual sequences representing functional PAMs and the relative activity of each sequence. Although these details are dispensable for a single, consensus sequence, emerging evidence suggests that CRISPR PAMs are more complex. For instance, the Streptococcus pyogenes Cas9 recognizes a weak NAG PAM in addition to the standard NGG PAM (Jiang et al., 2013b; Jinek et al., 2012), while the S. thermophilus CRISPR1 Cas9 recognizes multiple sequences that deviate from the consensus NNAGAAW PAM (where W is A or T) (Esvelt et al., 2013; Horvath et al., 2008; Kleinstiver et al., 2015). What remains to be developed is a standard means of conveying functional PAM sequences and their activities.
We sought to develop a means of representing functional PAMs that preserves both individual sequences and enrichment scores (Figure 2A). We selected interactive Krona plots that present hierarchical node-link diagrams (Ondov et al., 2011). By processing individual sequences and enrichment scores, the Krona plot outputs a wheel similar to the codon wheel used to relate codons and amino acids. We oriented the wheel to be read from the inner to the outer ring, conveying the PAM’s nucleotide sequence moving away from the protospacer. Besides capturing the PAM, the Krona plot also conveys the relative enrichment of a given sequence as directly proportional to the area of the sector. We term the resulting plots PAM wheels. Krona plots are encapsulated in interactive HTML files, allowing interrogation of all sequences regardless of their enrichment score (Krona S1 – S8).
To illustrate the applicability of the PAM wheel, we compiled previously published data sets from high-throughput depletion assays conducted with the Type II-A Cas9 from S. pyogenes, the engineered VQR variant of this Cas9 with altered PAM specificity, and the short Type II-A Cas9 from Staphylococcus aureus (Figure 2) (Kleinstiver et al., 2015; Ran et al., 2015). To convert depletion to a positive output, we inverted the extent of depletion to assign enrichment scores to each PAM sequence in order to generate the PAM wheel and sequence logo. The PAM wheel for the S. pyogenes Cas9 captures the canonical NGG PAM as the most enriched and the weak NAG PAM as the next most enriched sequence (Figure 2A, Krona S1) (Jiang et al., 2013a; Jinek et al., 2012). The PAM wheel also suggested bias against a C in the +1 position and a potential 2-nt gap between the protospacer and the PAM (e.g. TAGG, ATGG). Similarly, the PAM wheel for the VQR variant of the S. pyogenes Cas9 captured the reported NGA PAM but also indicated other functional PAMs (e.g. AGGG, TGCG) and bias toward an A at the +1 position (Figure 2B, Krona S2). These insights contrast with the sequence logos, which merely indicate a preference for a G over an A at the +2 position and for a G at the +3 position for the WT Cas9, and a preference for a G at the +2 position and for an A over a G at the +3 position for the VQR variant.
We also generated a PAM wheel based on a high-throughput depletion assay conducted with the compact S. aureus Type II-A Cas9 (Figure 2C, Krona S3–4) (Kleinstiver et al., 2015). The consensus PAM for this Cas9 has been reported to be NNGRR(T), where R represents an A or a G and the parentheses represent a beneficial but not essential nucleotide (Ran et al., 2015). The PAM wheel affirmed the NNGRR motif but also suggested a more nuanced bias for the T at the sixth position that applied to only some PAMs. Interestingly, this bias was more prevalent for non-canonical PAMs (e.g. AGGT, GGCT, GCGT in positions 3 – 6), suggesting that the T could compensate for deviations elsewhere in the PAM.
We first applied PAM-SCANR to the canonical Type I-E CRISPR-Cas system native to E. coli (Figure 3A). Because this system is present in our chassis strain of E. coli, we deleted the endogenous cas3 gene and inserting a constitutive promoter upstream of the adjacent casABCDE operon (Figure S1B). This modification eliminated DNA cleavage, allowing the Cascade complex encoded by the operon to reversibly bind target DNA and block transcription (Luo et al., 2014).
Using E. coli’s native I-E system as a model, we performed the comprehensive screen to elucidate the complete landscape of functional PAM sequences. We selected a 4-nt library to fully capture the canonical 3-nt PAM along with any biases at the −4 position (Figure S3A). No IPTG was used because of the potent gene repression previously observed for this system (Luo et al., 2014). The resulting PAM wheel (Krona S5) and accompanying sequence logo for one of two library replicates are shown in Figure 3A. The screen revealed a remarkably diverse set of functional PAMs including the well-established 3-nt functional PAMs (NGAG, NTAG, NAAG, NAGG, NATG) (Westra et al., 2013). While these PAMs showed no observable bias in the −4 position, the PAM wheel revealed other functional PAMs with biases at the −4 position, indicating that the E. coli I-E system recognizes a longer motif than previously accepted. Furthermore, some functional PAMs were more enriched than others, suggesting that Cascade exhibits ranging PAM preferences. In contrast, the sequence logo gave a limited picture of the PAM spectrum, with a tendency toward an NAAG PAM (Figure 3A).
To initially test the observed sequence biases and variable library enrichment, we cloned single representative functional PAM sequences back into the PAM-SCANR reporter constructs, and measured GFP fluorescence by flow cytometry (Figures 3B). We found a strong, linear correlation between the library fold-enrichment and mean GFP fluorescence. The correlation indicated not only that library fold-enrichment was an accurate proxy for the relative activity of a given functional PAM and protospacer but also that functional PAMs can vary widely in their activity.
We next asked how the observed PAM biases extend to a separate protospacer. To answer this question, we measured gene repression associated with representative PAMs by targeting the −35 element of the lacZ promoter controlling GFP in a lacI deletion strain. The PAMs were introduced immediately upstream of the protospacer to minimize interference with transcription. Gene repression was then measured in the E. coli strain lacking cas3 by comparing fluorescence levels for the guide RNA and a non-targeting control. We measured diverse extents of fold-repression that correlated with the enrichment scores from PAM-SCANR (Figure 3C). We also observed notable distinctions from the PAM-SCANR results: the CAAA PAM was the most active with ~1,000-fold repression, and the CATA PAM yielded greater repression than the CAAT PAM. These results confirm that E. coli’s I-E system is sensitive to bias at the −4 position, expanding the accepted PAM length. They also suggest some dependence on the protospacer sequence, in line with recent work with the Type I-E system reporting different PAM preferences for two protospacers (Xue et al., 2015).
Finally, we asked how the observed PAM bias translates into the DNA clearance assays commonly used for PAM determination. To limit any protospacer-specific biases, we recombineered the protospacer and representative PAMs from the PAM-SCANR reporter plasmid into the genome of the E. coli chassis strain and transformed a plasmid constitutively expressing the cas3 gene. We then measured the transformation efficiency of the guide RNA plasmid in comparison to non-targeting plasmid. Any reduction in the transformation efficiency can be attributed to the lethality of genome targeting and the frequency of escape (Gomaa et al., 2014; Vercoe et al., 2013). Interestingly, we observed similar fold-reductions in the transformation efficiency for all tested functional PAMs including the extremely weak AAAC and AAAA PAMs (Figure 3D), which may be explained by weak and strong PAMs all eliciting irreversible DNA damage and infrequent escape (Gomaa et al., 2014; Jiang et al., 2013b; Vercoe et al., 2013). Targeting was still PAM-dependent, as we observed a negligible fold-reduction for a non-functional PAM (Figure 3D).
We next explored the use of PAM-SCANR with Type II CRISPR-Cas systems and their widely exploited Cas9 effector proteins (Deltcheva et al., 2011). We began with the popular Type II-A Cas9 from S. pyogenes. We used the catalytically dead Cas9 (dCas9) containing point mutations to the RuvC and HNH endonuclease domains (D10A, H840A), which was shown to bind target DNA and block transcription in E. coli (Bikard et al., 2013; Jinek et al., 2012; Qi et al., 2013). We tested three PAM sequences representing the canonical NGG, the weak NAG, and a non-functional PAM within the PAM-SCANR platform (Jiang et al., 2013a; Jinek et al., 2012). Flow cytometry analysis showed a high fluorescence signal for the canonical PAM and similarly low fluorescence signals for the weak and non-functional PAMs (Figure 4A, 0 µM IPTG). Because the weak NAG PAM is known to exhibit some activity (Jiang et al., 2013a), we asked if stringency tuning using sub-saturating concentrations of IPTG could temper LacI activity and help reveal this PAM (Figures 1B and S2). As expected, the sub-saturating concentration of IPTG substantially increased the fluorescence for the NAG PAM over the non-functional PAM (Figure 4A, 10 µM IPTG). PAM-SCANR is therefore also applicable to Type II systems, and stringency tuning with IPTG can allow the detection of weak functional PAMs.
We next applied PAM-SCANR to the canonical Streptococcus thermophilus Type II-A system associated with the CRISPR1 locus. The system’s extensively studied Cas9 protein is substantially shorter than the S. pyogenes Cas9, comes from a generally-regarded-as-safe bacterium, and has one of the longest known and first discovered PAM sequences (NNAGAAW, where W is A or T) (Horvath et al., 2008). Previous depletion-based screens also identified a few functional PAMs that deviate from the consensus sequence (Kleinstiver et al., 2015).
We performed individual screening with PAM-SCANR using the catalytically dead version of the S. thermophilus CRISPR1 Cas9 (D9A, H599A) (Bikard et al., 2013; Jinek et al., 2012; Qi et al., 2013). We selected a 5-nt library spanning the consensus PAM sequence (Figures 4B) and applied an intermediate concentration of IPTG to reveal weaker PAMs (Figure S3B). We sorted and plated the ~0.4% GFP-positive cells from the transformed library and subjected cultures of individual colonies to flow cytometry analysis. Sanger sequencing was then performed on 38 of the fluorescent cultures to determine individual PAMs. We obtained four reoccurring sequences (Figures 4B and S4), all of which had been identified previously (Esvelt et al., 2013; Horvath et al., 2008). The mean fluorescence values were relatively scattered but substantially above the values for a non-functional PAM, which we attribute to the state of the cells following cell sorting and subsequent culturing.
To further explore the PAM landscape for this Cas9, we performed comprehensive screening (Figure 4C, Krona S6). The screen identified all four reoccurring PAMs from individual sequencing, which were the most enriched sequences in the PAM wheel. Interestingly, the +7 nucleotide in AGAAN was strongly biased toward a T and away from a C that was upheld by gene repression (Figure S4C), indicating a more complex canonical PAM than NNAGAAW. The screen also revealed other less enriched sequences that do not lend to a single consensus sequence. One of the slightly enriched sequences was complementary to AGAAT, although this PAM sequence yielded negligible gene repression when targeting the separate protospacer (Figure S4C). This collection of identified functional PAMs illustrates the utility of the individual method with PAM-SCANR and demonstrates that the S. thermophilus CRISPR1 Cas9 preferentially recognizes a hierarchy of PAM sequences.
The phylogenetic classification of CRISPR-Cas systems was recently expanded to six types (Makarova et al., 2015; Shmakov et al., 2015). Of the newly classified types, the Type V system has shown potential as an alternative to Cas9 (Zetsche et al., 2015). Characterization of the Type V-A Cpf1 protein from F. novicida U112 identified an NTTN PAM located on the 5’ end of the protospacer (Zetsche et al., 2015).
To employ the F. novicida Cpf1 protein with PAM-SCANR, we generated a catalytically dead version of the protein that would still bind target DNA. Based on mutational analysis from the original report of Cpf1 (Zetsche et al., 2015), we introduced two point mutations to the RuvC domains that were each implicated in DNA cleavage (D917A, E1006A). We then performed comprehensive screening with the resulting catalytically dead Cpf1 (dCpf1). We selected a 4-nt library immediately flanking the protospacer based on the reported 3-nt PAM (Zetsche et al., 2015) and applied an intermediate concentration of IPTG to reveal any weak PAMs (Figure S3C). Figure 5A shows the resulting PAM wheel (Krona S7) and sequence logo. In line with the previous characterization of Cpf1 (Zetsche et al., 2015), the most enriched sequences fell within the NTTN motif while some lightly enriched sequences matched the reported NCTN PAM. The screen indicated clear biases at the −1 and −4 positions and suggested new, weakly recognized PAMs that are similar to the complement of NTTN.
To validate the observed PAM biases, we measured the extent of gene repression by targeting the lacZ promoter upstream of GFP (Figure 5B). While the GTTC PAM yielded strong repression in comparison to a non-targeting RNA control, mutating the −1 or −4 position greatly impaired repression activity. We also measured limited repression for the previously validated GCTC PAM (Figure 5B) that mirrors its lower enrichment score (Figure 5A) and negligible repression for the non-canonical PAMs (Kleinstiver et al., 2015). These results affirm that strong biases exist within the consensus Cpf1 PAM. Furthermore, we demonstrate that Cpf1 can be readily repurposed for programmable gene regulation.
As a final extension of PAM-SCANR, we tested the ability of this method to identify de novo functional PAMs for the canonical Type I-C system from Bacillus halodurans. Type I-C systems are the second most abundant subtype (Makarova et al., 2015) and only require three proteins (Cas5d, Csd1, Csd2) for Cascade (Makarova et al., 2015; Nam et al., 2012). However, no functional PAMs have been experimentally determined to-date.
Previous work demonstrated that Cascade from E. coli’s I-E system could repress gene expression in the absence of the Cas3 endonuclease (Luo et al., 2014; Rath et al., 2014), although this remained to be demonstrated for any other subtype. We therefore imported the three genes for the B. halodurans I-C Cascade into our E. coli strain along with a designed guide RNA (Figure S1A,B). We also introduced a 4-nt library based on the putative NTTC PAM identified through bioinformatics analysis across Type I-C systems (Figure S3D) (Sorek et al., 2013). Stringency tuning with IPTG was necessary to reveal a small, GFP-positive subpopulation (Figure S3D). The resulting PAM wheel (Krona S8) and accompanying sequence logo from the comprehensive method are shown in Figure 6A.
The screen revealed a defined hierarchy of PAM sequences for the B. halodurans Type I-C system (Figure 6A). The most enriched set of PAM sequences (NTTC) matched bioinformatics predictions (Sorek et al., 2013). We also discovered new functional PAMs with incrementally lower enrichments. There was also a clear bias in the −4 position that again depended on the remaining nucleotides of the PAM. Interestingly, the I-C PAMs were the complement of the I-E PAMs (Figures 3B and and6B),6B), suggesting that these systems recognize opposite strands of the PAM or exhibit distinct nucleotide recognition properties.
To validate the identified PAMs and sequence biases, we performed gene repression by targeting the lacZ promoter upstream of GFP (Figure 6B). As part of the validation, we tested the four identified functional PAMs and the observed bias at the −4 position. We found that all tested functional PAMs yielded measurable gene repression. Furthermore, the extent of repression strongly correlated with each PAM’s library fold-enrichment despite targeting a separate protospacer. The repression data therefore validated the functionality of PAMs deviating from previous bioinformatics analyses as well as strong bias at the −4 PAM position. Additionally, we established that the abundant and compact Type I-C CRISPR-Cas systems can be harnessed for gene regulation.
In this work, we developed PAM-SCANR for the rapid identification of functional PAMs across diverse CRISPR-Cas systems. We were able to demonstrate the universality of the screen using five distinct CRISPR-Cas systems from three main types, including two (I-C and V-A) never before used for gene repression. The screen is likely amenable to other types and subtypes of PAM-dependent CRISPR-Cas systems, such as the putative Type VI systems (Shmakov et al., 2015) or the other Type I subtypes that form a stable Cascade complex (Brendel et al., 2014; Wiedenheft et al., 2011), with the potential to identify functional PAMs for the wide assortment of CRISPR-Cas systems found across the prokaryotic world.
We also developed the PAM wheel to effectively capture the diversity and enrichment of functional PAMs without the loss of information. The wheel is based on interactive Krona plots that allow the user to interrogate individual sequences, including those with low enrichment (Krona S1 to S8) (Ondov et al., 2011). PAM wheels are applicable to all existing high-throughput screens and offers a powerful alternative over current means of representing PAMs, including sequence logos that sacrifice the ability to identify individual sequences or activities and abbreviated lists of functional PAMs designated as either strong or weak (Horvath et al., 2008; Shmakov et al., 2015).
Our identification and visualization of functional PAMs across multiple CRISPR-Cas systems revealed remarkable flexibility and bias that better lends to a sequence-activity landscape than a consensus sequence. We found this landscape to be heavily influenced by the periphery of the PAM as well as functional PAMs deviating from the consensus sequence, although other factors such as the protospacer sequence (Xue et al., 2015), the number of consecutive N nucleotides in the PAM following the protospacer (Chen et al., 2014), and possibly weak recognition of the complement of canonical PAMs could contribute. The concentration of Cas proteins may also influence PAM preferences, although this effect has only been reported under in vitro conditions (Karvelis et al., 2015). The landscape also varied widely between even related systems, as illustrated by the relative PAM flexibility and sequences for the E. coli I-E system and the B. halodurans I-C systems. Performing PAM screens on a wider set of systems from different types and subtypes followed by structural analyses will shed light on the molecular basis of PAM recognition and how it varies across the diversity of CRISPR-Cas systems. Aside from motivating structural studies to interrogate PAM recognition, these findings underscore the diversity of functional PAMs that would enable flexible invader targeting and potentially explain biases observed in viral escape of CRISPR-targeting via PAM mutation (Paez-Espino et al., 2013). The remarkable diversity of functional PAMs may confound off-target predictions while also presenting potential opportunities to expand the pool of available DNA targets and to design RNA guides with tailored activities for the next generation of CRISPR-based technologies.
PAM-SCANR relies on gene repression to distinguish functional and non-functional PAMs. One drawback of gene repression is the lack of nuclease activity associated with fully functioning systems, potentially missing PAM-dependent allosteric changes that drive DNA cleavage or Cas3 recruitment (Anders et al., 2014; Hochstrasser et al., 2014). In addition, the variable PAM activities may be related to the ability of the Cas effector proteins to block transcription rather than to tightly bind DNA and elicit cleavage. However, PAM-SCANR faithfully reproduced canonical PAMs across multiple CRISPR-Cas types and systems, suggesting that DNA binding translates well to DNA cleavage. Further exploring the influence of PAMs on DNA binding and cleavage, particularly in the context of CRISPR immunity and genome editing, could help further refine the utility of PAM-SCANR and the identified PAMs for downstream applications.
Table S1 lists all strains, plasmids, and oligonucleotides and the Extended Experimental Procedures provides an in depth explanation of the construction of all strains and plasmids. All foreign Cas proteins were inserted into pBAD33 with the constitutive J23108 promoter. For the Type I-E system, a previously reported expression system was used (Luo et al., 2014). The guide RNA plasmids were generated by inserting a CRISPR array or sgRNA downstream of the constitutive J23119 promoter in pBAD18. All experiments were conducted in derivatives of E. coli BW25113 lacking the lacI promoter through the lacZ gene.
E. coli strains were cultured at 37°C and 250 RPM in Luria Bertani (LB) medium or M9 minimal medium with 0.4% glycerol and 0.2% casamino acids or on LB agar. LB medium was used for all experiments except those with the E. coli I-E CRISPR-Cas system due to limited growth in M9 minimal medium with three plasmids. Plasmids were maintained with ampicillin, chloramphenicol, and/or kanamycin as needed. Liquid media was supplemented with IPTG as specified.
Overnight cultures were diluted to an ABS600 of 0.01 and cultured to an OD600 of ~0.2. Cultures were analyzed on an Accuri C6 Flow Cytometer with CFlow plate sampler (Becton Dickinson). Events were gated based forward scatter and side scatter and fluorescence was measured in FL1-H, with at least 30,000 gated events for data analysis.
For gene repression with three plasmids, cultures were inoculated and grown for 16 hours prior to flow cytometry analysis. Fold-repression was calculated as the ratio of the mean fluorescence values for the CRISPR plasmid over that of a non-targeting plasmid.
For the individual screening, overnight cultures of single colonies were diluted to an ABS600 of 0.01 and grown for ~3 hours prior to flow cytometry analysis. Plasmids from cultures exhibiting fluorescence over a non-targeting control were then isolated, and the region flanking the nucleotide library was PCR amplified and submitted for Sanger sequencing.
Overnight cultures were diluted to ABS600 of ~0.01 and grown to an ABS600 of ~0.2 prior to sorting on a MoFlo XDP Cell Sorter (Beckman Coulter). A non-targeting strain for each system was analyzed to establish all necessary gating parameters for sorting fluorescent populations. Cultures were subjected to one or two rounds of sorting resulting in at least 50,000 GFP-positive events. Sorted cells were diluted into LB medium and cultured overnight.
DNA was prepared for sequencing by amplifying the PAM library from plasmid DNA, isolating it from pre-sorted and post-sorted cultures, and indexing it with Nextera barcodes. Amplicons were then subjected to MiSeq sequencing (Illumina). Data was processed using command line text programs to trim out the PAM library and count the numbers of time a sequence occurred. All raw and processed reads are available through NCBI GEO (Accesion #GSE75718). The Detailed Protocol provides an in depth explanation of the sequencing analysis and PAM wheel generation.
For the cell killing assays with the I-E CRISPR-Cas system, E. coli strains were modified by λ-red recombination to harbor protospacers and PAMs from the reporter plasmid. The strains were then electroporated with targeting and non- targeting guide RNA plasmid followed by recovery and dilution plating on LB agar. The fold-reduction in the transformation efficiency was calculated as the ratio of the number of transformants for the non-targeting plasmid divided by that of the CRISPR plasmid.
See Extended Experimental Methods for additional details on all experimental procedures.
We thank Stacie Meaux at Research Square for developing the video summary contained in Movie S1, Brooke McGirr for assistance with cloning and recombineering, and Michelle Luo for devising the PAM-SCANR acronym and for critical reading of the manuscript. We also thank Sarah Schuett at the NCSU CVM Cell Sorting Facility for all FACS work. The BhaloCascade plasmid was a gift from Ailong Ke, and the SpydCas9 plasmid was a gift from Lei Qi (Addgene #44249). The work was supported by funding from the National Science Foundation (CBET-1403135 to C.L.B and R.B., MCB-1452902 to C.L.B.), the Kenan Institute of Engineering, Technology and Science (to C.L.B.), the National Institutes of Health (5T32GM008776-15 to R.T.L.), and an NCSU undergraduate research grant (to R.A.S.).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
AUTHOR CONTRIBUTIONSR.T.L. and C.L.B. devised PAM-SCANR. R.T.L., K.R.M., and C.L.B. developed the experiments. R.T.L., K.R.M., R.B., and C.L.B. analyzed the experimental data. R.T.L., K.R.M., R.A.S, R.N.A., and A.A.G. performed the experiments. R.T.L. and A.E.B. generated the PAM wheels and sequence logos. R.T.L., K.R.M., R.B., and C.L.B. wrote the manuscript.