Reviewer's report 1
W. Ford Doolittle, Dalhousie University.
Reviewer comments
This is a nice little exercise in bioinformatics, showing that there is significant incorporation of non-mobile, non-viral genes as CRISPR spacers. Because the genomic databases are so skewed in terms of taxonomic inclusion, authors are wise to avoid getting very quantitative about their results.
It is interesting to wonder and important to know how these genes get into the recipients. The speculation that many archaea have the equivalent of nonspecific transformation systems seems unavoidable --- how else would they acquire eukaryotic DNA? The latter imports do raise further questions. Since there are relatively few eukaryotic genomes and since so much of so many of them is made up of non-coding sequences that would be expected to lose BLASTN detectability quickly, one might infer that most of the CRISPR sequences that match nothing are actually eukaryotic. Could that be so?
It also seems a bit of a no-brainer that most CRISPR phage hits are to phage infecting other (albeit congeneric) species. If the CRISPR is active, will it not ensure that phages that match it perfectly do not infect effectively? And where the hit is to a chromosomal gene not from the same species, cannot this simply mean that if it were, there would have been an autoimmunity problem?
Author's response:
"It is interesting to wonder and important to know how these genes get into the recipients. The speculation that many archaea have the equivalent of nonspecific transformation systems seems unavoidable --- how else would they acquire eukaryotic DNA? "
We agree with this comment, but just to make the message more pronounced we included the actual term competence in the text as follows:"... take up naked DNA from the environment (i.e. natural competence),"
The latter imports do raise further questions. Since there are relatively few eukaryotic genomes and since so much of so many of them is made up of non-coding sequences that would be expected to lose BLASTN detectability quickly, one might infer that most of the CRISPR sequences that match nothing are actually eukaryotic. Could that be so?
While we think that most CRISPR sequences come from under-represented (database-wise) archaeal viruses and not eukaryotes, we really liked that comment. Thus we added the sentence " Since eukaryotic sequence space is still poorly explored, and eukaryotic genomes contain a lot of non-coding material that can rapidly lose sequence similarity, it is possible that many more spacers are eukaryote-derived, "
It also seems a bit of a no-brainer that most CRISPR phage hits are to phage infecting other (albeit congeneric) species. If the CRISPR is active, will it not ensure that phages that match it perfectly do not infect effectively?
Nevertheless there are cases where a few strains of the same species have been sequenced, and there we do observe hits within the species. We realized that the figure legend did not state that "Within species matches are not shown.", and have added that sentence.
And where the hit is to a chromosomal gene not from the same species, cannot this simply mean that if it were, there would have been an autoimmunity problem?
Correct, and thus the adverb "Surprisingly" was removed from the sentence which now reads" Most spacers have best matches outside the specific CRISPR..."
Reviewer's report: 2
John van der Oost, Wageningen University Brodt and co-workers report on an in silico analysis of CRISPR spacers in archaeal genomes. Whereas half of the spacers match with plasmids or viruses, a substantial number of matches have been detected with chromosomal genes of other organisms, mainly archaea. This observation has been interpreted as evidence for recombination events. The spacers with matches to viruses tend to be located more towards the 3' end of the CRISPR, indicative of the selective pressure to retain them. This study adds some details to the rapidly developing CRISPR field. The text requires serious polishing and more careful phrasing Comments
1) CRISPR/Cas is name for the defense system, CRISPR is the repetitive array;
when referring to gene(s), cas should be italics;
2) P.1, Background, line 3: should be: CRISPR arrays can be transcribed and processed into small crRNA molecules.
3) P.1, Background, line 4: should be: foreign nucleic acid for degradation.
4) P.1, Conclusions, line 3: should be: anti-viral and anti-plasmid defense.
5) P.2, Background, line 14-17: rephrase since archaeal systems have not been studied in enough detail - there may be a "seed; sequence there as well, as recently described for the E. coli system;
6) P.3, Results & Discussion, line 3-4: higher numbers of spacers - provide numbers in text
7) P.3, Results & Discussion, line 13: CRISPR should be: CRISPR-associated.
8) P.3, Results & Discussion, line 22-31: Mention the different types of recombination: Conjugation, Transfection, and Transformation.
9) P.3, Results & Discussion, last line: provide examples of CRISPR loci that reside on plasmid (add Suppl Table).
10) P.4, Results & Discussion, line 14-16: identity is not the only thing that matters, so does precence of PAM motif - include this in analysis and discussion;
11) P.5, Results & Discussion, line 3: substitute "organism" by "related virus";
Author's response:
1) CRISPR/Cas is name for the defense system, CRISPR is the repetitive array;
when referring to gene(s), cas should be italics;
Fixed
2) P.1, Background, line 3: should be: CRISPR arrays can be transcribed and processed into small crRNA molecules
Fixed, in both abstract and main text.
3) P.1, Background, line 4: should be: foreign nucleic acid for degradation.
Fixed
4) P.1, Conclusions, line 3: should be: anti-viral and anti-plasmid defense.
Fixed
5) P.2, Background, line 14-17: rephrase since archaeal systems have not been studied in enough detail - there may be a "seed; sequence there as well, as recently described for the E. coli system;
Rephrased to: "Archaeal CRISPR/Cas has been shown to confer almost 100% immunity in cases where spacers were identical to the target sequence, but partial matches also provide substantial immunity in archaea [
12]. A short seed sequence that requires a perfect match has been recently discovered in bacteria [
6], but whether such seed also exists in archaea, remains to be determined"
6) P.3, Results & Discussion, line 3-4: higher numbers of spacers - provide numbers in text.
Done
7) P.3, Results & Discussion, line 13: CRISPR should be: CRISPR-associated
Fixed
8) P.3, Results & Discussion, line 22-31: Mention the different types of recombination: Conjugation, Transfection, and Transformation.
Done
9) P.3, Results & Discussion, last line: provide examples of CRISPR loci that reside on plasmid (add Suppl Table).
Done
10) P.4, Results & Discussion, line 14-16: identity is not the only thing that matters, so does precence of PAM motif - include this in analysis and discussion;
We thank the reviewer for pointing that possibility, which we have overlooked. We have revised the text to: "The fact that this self-matching spacer is tolerated in this organism implies that either the proto-spacer associated motif (PAM) has been mutated in the chromosomal gene sequence rendering the gene immune [
3] or that the chromosomal CRISPR/Cas system had become deactivated in this archaeon." We also performed some analysis, assuming that the Halomicrobium CRISPR/Cas is CRISPR group 1, like Haloarcula to which some of its
cas genes show some similarity, we would expect the PAM to be (t/a)GG (based on Mojica et al., Microbiology 2009), but we find CCC in the chromosome instead, which should protect the gene from cleavage. This is very nice, but because of the assumptions/guesswork involved we would rather not include these results in the text.
11) P.5, Results & Discussion, line 3: substitute "organism" by "related virus";
Fixed
Reviewer's report: 3
Christa Schleper, University of Vienna (nominated by J. Peter Gogarten, University of Connecticut)
The authors present an analysis of archaeal CRISPR spacers searching for their potential origin by DNA similarity searches in the public databases. As expected and shown in earlier studies, the authors demonstrate that most spacers are homologous to sequences of viruses or other mobile genetic elements, but they find also a considerable fraction of matches to classical chromosomally derived (or housekeeping) genes in other organisms. By superimposing the identified protospacers with taxonomic clusters, the authors give an overview of the potential sources of the archaeal spacer sequences which indicates potential routes of horizontal gene transfer.
The manuscript is very clearly and concisely written and it is inspiring. For example, I got the feeling that the frequent exchange of DNA among Sulfolobales, as often insinuated through conjugative and comparative genomic studies is somewhat reflected in the CRISPR world. It is also of interest to see, that several viruses in archaea might have a broader host range than expected.
Of course a potential chromosomal DNA transfer from eukaryotes, even humans, to archaea should be a rather rare event(!). However, I think that the authors should consider (and discuss) another potential mechanism to explain the general picture: Spacers, that were perhaps originally self-directed against the own genome and thus raised autoimmunity, might have caused selection for organisms, that have lost the respective target gene. In that case, the spacer, which remained, was originally derived from a gene of the own chromosome, but in today's blast searches the next best match appears to another (maybe even distantly related) organism. The authors might find out about this possibility, by checking if another orthologue of that gene is found in the chromosome of the spacer-carrying organism or not. Furthermore, I think it is of importance, to include in the analyses more than just the best BLAST matches. If e.g. a Methanococcus spacer matches a human sequence best, but the second best match with an almost identical e-value is to a bacterium, then I would be far less convinced of a eukaryotic-archaeal transfer.
Minor comments:
it is not possible for the reader to identify the spacers listed in tables 1-10. It would be helpfull to number the spacers of each organism and to link them to the additional data set, in which the spacer sequences are explicitly given (suppl. No. 12)
Figure : colour of Nanoarchaeota does not match with colour legend- Figure : not all loci have 150 spacers. Explain how this figure should be interpreted.
thermophilic, halophilic is misspelled
Authors response:
I think that the authors should consider (and discuss) another potential mechanism to explain the general picture: Spacers, that were perhaps originally self-directed against the own genome and thus raised autoimmunity, might have caused selection for organisms, that have lost the respective target gene. In that case, the spacer, which remained, was originally derived from a gene of the own chromosome, but in today's blast searches the next best match appears to another (maybe even distantly related) organism. The authors might find out about this possibility, by checking if another orthologue of that gene is found in the chromosome of the spacer-carrying organism or not.
This is an interesting suggestion, although one would still need to explain how the autoimmunity emerged in the first place, so exposure to foreign DNA that is similar is not altogether excluded. We performed the analysis suggested and added the following paragraph: " For spacers that match non mobility-associated ORFs in other species, one may also consider an alternative explanation, other than horizontal gene transfer. Spacers that may have caused autoimmunity could have led to a loss of the self-targeted gene, while orthologs of that gene persist in other related species. While this scenario can never be totally ruled out, one would expect that such a phenomenon will often rely on existence of related gene (or genes) in the organism that possessed the spacer that compensate for the lost homolog. We therefore looked for homologous genes by a BLASTX search of the gene matching the spacer against the proteome of the organism where the spacer originated, requiring E-value < E-5; > 66% sequence coverage and > 50% sequence similarity. Out of 26 spacers that had best matches in coding genes of other species (Additional Table S6, Additional Table S8), two encode conserved essential proteins that were presumably never lost (Orc1 and PCNA), 21 genes had no BLASTX matches, and only three had related proteins present. Thus, it appears that in general it is horizontal gene transfer, rather than gene loss driven by autoimmunity that underlies the accumulation of these spacers."
Furthermore, I think it is of importance, to include in the analyses more than just the best BLAST matches. If e.g. a Methanococcus spacer matches a human sequence best, but the second best match with an almost identical e-value is to a bacterium, then I would be far less convinced of a eukaryotic-archaeal transfer.
We have addressed this point by adding an additional dataset that includes all hits that passed our threshold - not just the best hits (Additional dataset 2). Regarding the hits to eukaryotes, there were no close matches to bacteria except for one case: the spacer that produced 29/29 nucleotide identity against two insect genomes also gave a 27/27 match to hrpW gene in Pseudomonas viridiflava a plant pathogen. Curiously this gene encodes an effector protein that elicits plant responses so it is imaginable that parts of it have been transferred from bacteria to eukaryotes or vice versa. In any case two fewer identical residues mean a weaker similarity by more than an order of magnitude.
Minor comments:
it is not possible for the reader to identify the spacers listed in tables 1-10. It would be helpfull to number the spacers of each organism and to link them to the suppl. data set, in which the spacer sequences are explicitely given (suppl. No. 12). To address this, we have now included the CRISPRdb reference of each spacer in all the additional tables, which have been re-done and are provided as Excel files. Thus, it is now easier to identify individual spacers.
Figure : colour of Nanoarchaeota does not match with colour legend. Fixed, Figure : not all loci have 150 spacers. Explain how this figure should be interpreted. We have re-phrased the legend of this figure, hope it is clearer now.
thermophilic, halophilic is misspelled. Fixed