Restriction endonucleases and their nuclease relatives function to cleave a variety of nucleic acid substrates in various cellular processes. The SCOP (
1) database currently assigns 20 different families to the restriction endonuclease-like superfamily, including 13 restriction endonucleases (
2) and various other nucleic acid cleaving enzymes such as lambda exonuclease (
3,
4), DNA mismatch repair protein (MutH) (
5), very short patch repair (Vsr) endonuclease (
6), N-terminal domain of TnsA endonuclease (
7), endonuclease I (
8), archaeal Holliday junction resolvase (Hjc) (
9) and XPF/Rad1/Mus81 nuclease (
10). The cleavage reactions performed by these highly diverse proteins contribute to important biological functions, such as protecting host organisms against foreign DNA invasion (restriction endonucleases), repairing damaged DNA (MutH and Vsr), resolving Holliday junctions (endonuclease I, Hjc and XPF/Rad1/Mus81-dependent nuclease) and performing additional cleavage events in DNA recombination (lambda exonuclease and TnsA).
The restriction endonuclease-like superfamily is defined by a common core fold that includes a four-stranded, mixed β-sheet flanked on either side by an α-helix (αβββαβ topology, ). Residues within a relatively conserved PD-(D/E)XK motif (Motifs II and III, ) mark the active site and contribute to cleaving the nucleic acid phosphodiester bond (
4,
11,
12). In addition to this named motif, a conserved acidic residue often resides at the N-terminus of the first core α-helix (Motif I, ), while a conserved residue from the second helix points toward the active site (Motif IV, ) in a subset of families. These residues play various catalytic roles, which include coordination of up to three divalent metal ion cofactors, depending on the family. The shared structural and functional features of restriction endonuclease-like families have been interpreted as evidence for a common evolutionary origin and have been exploited by various groups to identify and group endonuclease sequences (
2,
12–
14).
In addition to sequence- and structure-based methods, analysis of genomic context and domain fusions have led to identifying new restriction endonuclease-like domains (
15,
16). Restriction endonuclease-like proteins frequently cooperate with their genome neighbors to perform specific biological functions. For example, restriction-modification systems include a restriction endonuclease and a methyltransferase that function together to protect cells against foreign DNA. The two genes encoding these enzymes often reside adjacent to each other in genomes or can be found fused in a single gene. Similarly, domain fusions exist between restriction endonuclease-like proteins and superfamily I/II helicases, suggesting a close functional association of nuclease activity with ATP-dependent DNA helix unwinding. Other functional restriction endonuclease-like fold fusions include several different types of Zn-binding and DNA-binding domains. Analysis of these conserved gene neighborhoods/gene fusions has allowed the prediction of a novel prokaryotic DNA repair system (
16) that includes previously identified members of the restriction endonuclease-like fold group [RecB nuclease domain (
12)]. In addition to implying functional associations, similar conservations of domain fusions and genomic organizations help justify new restriction endonuclease-like fold predictions of increasingly divergent families.
To further expand the realm of the restriction endonuclease-like superfamily, we combine the concept of transitivity (‘If A is B and B is C, then A is C’) with a fold recognition approach Meta-BASIC (
http://basic.bioinfo.pl) to identify nine new uncharacterized protein families as endonucleases. Meta-BASIC combines the use of sequence profiles and secondary structure predictions (meta profiles) for given protein families [currently PfamA (
17) or PDB90 (
18)] with various scoring systems and meta profile alignment algorithms to quickly establish links between families of both known and unknown structure. Using a query endonuclease of known structure (1gef), numerous hits to potential and known restriction endonuclease-like fold families arise, including a domain from an existing structure that was not identified by structure-based methods. Although many of the scores assigned to these hits fall below a confident threshold of 12 [predictions with
Z-score above 12 have <5% probability of being incorrect (
19)], we can further extend the limits of Meta-BASIC detection by applying the concept of transitivity to identified sequence groups. Results for the restriction endonuclease-like fold superfamily suggest that our sequence-based fold recognition approach can provide additional information about structural similarities in the realm of the ‘midnight zone (
2)’ and can identify divergent folds in new and existing structures that structure-based identification methods may fail to find.