Our results suggest that functionally uncharacterized proteins grouped together in COG4636 are a branch of the PD-(D/E)XK superfamily, which has not been identified to date due to a presence of an unusual variant of the active site, which lacks the conserved Lys residue at the typical position in the primary sequence. That the catalytic Lys can migrate in the framework of the active site of PD-(D/E)XK nucleases has been suggested earlier, based on the sequence analysis of another nuclease domain found in site-specific, non-long terminal repeat retrotransposable elements [2
], but to date no molecular model was offered to suggest the alternative point for the attachment of the side chain to the protein backbone. Our sequence analysis of the COG4636+ family and the structural model of one of its members explain the problems with identification of the PD-(D/E)XK motif on the sequence level and provide a platform for further studies. Specifically, our analysis points at the most interesting members of the family, which display previously not observed variants of the PD-(D/E)XK active site. Experimental analyses of these proteins and determination of the role of individual amino acids in the evolutionary context may help to better understand the plasticity of the PD-(D/E)XK active site and may settle down the controversy in the field of nucleases regarding the mechanism(s) of the reaction.
Phylogenomic analyses show that putative nucleases grouped in the COG4636+ family are exceptionally abundant in genomes of certain Cyanobacteria, but absent in others. They are typically abundant in the sequenced genomes of freshwater species, but scarce in the genomes of marine species, with the exception of C. watsonii
WH 8501, which was isolated from tropical waters of the Western Atlantic and Pacific oceans. It is remarkable that members of COG4636+ are almost absent from the genomes of Synechococcus
species thriving in the Sargasso sea, as well as in the environmental samples isolated from that region. On the other hand, in G. violaceus
PCC 7421 they comprise over 2% of all protein-encoding genes. This phylogenetic distribution resembles that of mobile genetic elements such as introns or insertion sequences (reviews: [41
]) and suggests that the contemporary COG4636+ family originated from a few predecessors that underwent extensive horizontal gene transfer and massive proliferation in certain genomes. Monophyly of COG4636+ sequences in non-Cyanobacterial species strongly suggests that proliferation occurred in each of these species independently, following a single event of colonization by horizontal transfer from a Cyanobacterium (or in the case of T. thermophilus
– three independent successful colonizations).
We hypothesize that the mechanism by which these putative nucleases induce their proliferation in a genome is similar to that displayed by homing nucleases and restriction enzymes [43
], namely to incise the DNA by introducing nicks or double-strand breaks, which stimulates recombination and may lead to tandem duplications and a variety of genomic rearrangements [44
]. Frequent cleavage of the genomic DNA would be lethal for the cell, therefore if members of COG4636+ are indeed active as nucleases, then they should target rare sequences (in a manner similar to homing endonucleases; review: [48
]) or unusual structures in the DNA (similarly to the structure-specific Holliday junction resolvases), or their activity would have to be somehow regulated (inhibited) by interactions with other proteins or cellular processes (for instance by DNA modification). There are known examples of Holliday junction resolvases carried on defective lambdoid prophages [49
]. Unfortunately, analysis of the genomic neighborhood shows no preferred association of COG4636+ members with any mobile genetic elements or particular gene families that could give us hints about the cellular processes they could be part of or suggest how their predicted nuclease activity could be inhibited or regulated. Especially, we found no correlation with the presence of known or putative methyltransferases. This suggests that despite sharing the common PD-(D/E)XK fold with REases, COG4636+ members are unlikely to serve as parts of restriction-modification systems, which are known to be abundant in Cyanobacteria [50
]. It must be noted, however, that multiple solitary DNA methyltransferases were reported in Anabaena
PCC 7120 [51
], and these enzymes could potentially provide protection against the cleavage of the chromosomal DNA by at least some of the COG4636+ members found in this organism.
One possibility is that COG4636+ members serve as a part of the restriction barrier, similarly to the unrelated NucA family of extracellular nucleases found in Cyanobacteria, e.g. Anabaena
sp. PCC 7120 [52
] and Microcystis
]. They could also fulfill a role in maintenance of the identity of the species by controlling the flow of incoming DNA, as recently suggested for restriction-modification systems [54
]. From the genomic analyses it appears, however, that the primary function of COG4636+ members is to spread and multiply, and their cellular roles may be merely side-effects of this selfish expansion. It is very likely that their nuclease activity is recombinogenic and may increase the frequency of genomic rearrangements. Moreover, the multiplication of closely related COG4636+ members in certain genomes leads to an abundance of dispersed related DNA sequences, which by themselves may increase the frequency of genome rearrangements by homologous recombination. It was suggested that in the marine Cyanobacteria the factors that increase the genome plasticity might not be promoted by natural selection due to the homeostatic environment of the open ocean [55
]. Conversely, the unstable environment of fresh waters might promote the spreading of factors that destabilize the genome by increasing the frequency of recombination and thereby increase the diversity of the population. This is in good agreement with our finding of prevalence of COG4636+ members in Cyanobacteria that thrive in fresh waters and their paucity in marine species (with the exception of C. watsonii
WH 8501). Summarizing, it is plausible that members of COG4636+ fulfill an important role in the genome dynamics of Cyanobacteria and other species they colonize. We hope that our predictive study will facilitate experimental determination of the molecular and cellular function of members of this intriguing protein family.