|Home | About | Journals | Submit | Contact Us | Français|
The His-Me finger endonucleases, also known as HNH or ββα-metal endonucleases, form a large and diverse protein superfamily. The His-Me finger domain can be found in proteins that play an essential role in cells, including genome maintenance, intron homing, host defense and target offense. Its overall structural compactness and non-specificity make it a perfectly-tailored pathogenic module that participates on both sides of inter- and intra-organismal competition. An extremely low sequence similarity across the superfamily makes it difficult to identify and classify new His-Me fingers. Using state-of-the-art distant homology detection methods, we provide an updated and systematic classification of His-Me finger proteins. In this work, we identified over 100 000 proteins and clustered them into 38 groups, of which three groups are new and cannot be found in any existing public domain database of protein families. Based on an analysis of sequences, structures, domain architectures, and genomic contexts, we provide a careful functional annotation of the poorly characterized members of this superfamily. Our results may inspire further experimental investigations that should address the predicted activity and clarify the potential substrates, to provide more detailed insights into the fundamental biological roles of these proteins.
The His-Me finger domain occurs in a large and diverse superfamily of endonuclease proteins present in all forms of life and involved in various cellular processes. In the literature, it is also dubbed the ‘HNH’ endonuclease (after its conserved sequence motif) or the ββα-metal endonuclease (denoting the conserved secondary structure of the fold and bound catalytic metal ion) (1).
Its function is associated with the catalytic cleavage of nucleic acids in multiple contexts, such as intron homing, phage DNA packaging, recombination or apoptotic DNA degradation. Moreover, His-Me finger-containing proteins are often a key part of the defence and stress response systems of bacteria and archaea, e.g. restriction–modification, toxin–antitoxin and CRISPR/Cas9 systems.
The His-Me finger is defined by a common structural core consisting of a β-hairpin followed by an α-helix, forming a binding site for a single catalytic metal ion (Figure (Figure1).1). The finger might be considered as a part of the treble clef structural motif, which is formed by a zinc knuckle, loop, β-hairpin and α-helix (1). However, since many His-Me finger representatives lack the zinc knuckle (2,3), they escape the orthodox treble clef classification.
The His-Me finger is thought to be suited for efficient, nonspecific DNA cleavage (4,5). The α-helix fits into the DNA minor groove, which in consequence precisely aligns the remaining β-hairpin against the DNA backbone for nucleolytic reaction (6–8). In structure-specific enzymes, such as the Holliday junction resolvases, the core α-helix contacts the minor groove side of the branch point (9). The finger alone displays little substrate specificity, but may be easily tailored to a particular function by the presence of additional domains or structural elements tethered to the core of the ββα-metal fold (3,10). The single metal ion-based catalytic site keeps the domain compact; however, its small size means it cannot easily achieve structural stability. Indeed, its three-element structural motif is unable to provide enough of a hydrophobic core for stability. Thus, many proteins containing the His-Me finger domain employ a variety of additional structural elements or domains for stabilization and/or specificity. These include embedded zinc binding motifs, like the zinc knuckle of the treble clef mentioned above (e.g. in restriction or CRISPR nucleases), or additional decoration ranging from single α-helices to extensive β-sheets—e.g. in DNA degrading nucleases like endonuclease G (EndoG) (11) and CPS-6 (12), or virulence factors like nuclease A (NucA) (13), Serratia marcescens nuclease (Sm) (14), and streptodornase (Spd1) (15).
The term ‘His-Me’ refers to a nearly invariant catalytic histidine (His) residue and a bound metal ion (Me). His-Me finger-containing proteins are commonly called HNH nucleases, due to the sequence motif present in the founding superfamily members. The HNH sequence motif consists of the central, nearly invariant, catalytic histidine at the C-terminus of the first β-strand (HNH), an asparagine (HNH) in the extensive loop connecting the two β-strands—known as the Ω-loop (16)—and a histidine (HNH) in the core α-helix. This second histidine is often substituted by an asparagine, which, in most cases, is the only residue that directly binds the metal ion (Figure (Figure11).
The catalytic site of the His-Me finger consists of the catalytic histidine and a divalent metal ion ligand, coordinated mainly by the core α-helix residues through either direct or water-mediated interactions (Figure (Figure1).1). During the nucleolytic reaction, the ion (predominantly Mg2+ (17,18)) destabilizes the scissile phosphodiester bond and neutralizes the negatively charged transition state. The catalytic histidine serves as a general base to deprotonate and activate a nucleophilic water molecule used for direct hydrolysis of the nucleic acid (19). In some cases, as in PacI restriction endonuclease (20), the histidine is absent, and its role is taken by a tyrosine, located in the Ω-loop (20). In other cases, such as RecA-dependent nuclease (2), Vibrio vulnificus nuclease (Vvn) (21) or endonuclease I (22), the active site is supplemented by an arginine, located in the Ω-loop or in the N-terminal α-helix, which stabilizes the cleaved phosphate.
Despite their common structural core and sequence motifs, His-Me fingers display extreme sequence diversity, which makes identification of new superfamily members difficult. Moreover, even for known 3D structures, it is difficult to annotate the HNH nuclease domain as it may be distorted and/or buried within a broader structural context. Nonetheless, due to the high biological importance and widespread occurrence of the His-Me proteins, there is a need for their systematic detection and classification. A previously published classification (23) was based on a sparse set of example proteins. However, since its publication in 2003 there has been an explosion of sequence, structural and biochemical data, rendering the classification somewhat outdated. Here, we apply state-of-the-art distant homology detection, combined with extensive sequence and literature searches, to collect the most complete set of proteins containing the His-Me finger domain. Detailed analysis of their sequence-structure-function relationships provides an updated, comprehensive classification of the His-Me finger superfamily.
Firstly, all known His-Me finger families were obtained from the His-Me_finger ‘clan’, as defined by the Pfam database (24). This initial set of His-Me finger domains was used for further searches using our Gene Relational DataBase (GRDB) system based on Meta-BASIC (25), a highly sensitive method for distant homology detection, which compares sequence profiles, enriched by predicted secondary structure (meta-profiles). GRDB includes pre-calculated Meta-BASIC connections between PFAM, KOG and COG families, and proteins of known structure from PDB90, which contains representatives from the Protein Data Bank (PDB), filtered at 90% sequence identity. Each family and structure in the system was represented by its: (i) sequence (for PDB90 structures), or consensus sequence (for PFAM, COG and KOG families), (ii) sequence profile generated with PSI-BLAST (26) (3 iterations, with an inclusion threshold of 0.001) using the NCBI non-redundant protein sequence database derivative (NR70) and (iii) secondary structure, predicted using PSI-PRED (27). The search strategy was based on the concept of transitivity, where each newly identified PFAM, KOG, and COG family or PDB structure was used in further Meta-BASIC searches until no new additional His-Me fingers were found. In addition to the high-reliability Meta-BASIC predictions—with scores >40, corresponding to an E-value of 0.05—all hits scoring <40 were included to identify potentially correct predictions that may have been placed among the unreliable or incorrect ones. These potentially correct predictions were selected based on manual assessment of the conservation of the core secondary structure elements and sequence motifs deemed critical for the His-Me finger fold and nuclease function.
The resulting sequences of His-Me finger representatives retrieved from GRDB and literature searches served as a starting point for exhaustive transitive PSI-BLAST (26) searches against the NCBI non-redundant (NR) protein sequence database (as available in January 2016), with six iterations and an E-value threshold of 0.001. Specifically, at each step, the collected sequences were clustered at a 40% identity threshold with CD-HIT (28), filtered with regard to false positive hits, and used as an input for the subsequent PSI-BLAST runs until convergence.
The sequences of all the His-Me finger domains were compared, all-against-all, using protein BLAST to detect clusters of closely related proteins. Given that sequences within a superfamily can have different levels of similarity, we tested various thresholds, spanning from 10−5 to 10−11. As a result of our benchmarks, the E-value cut-off for clustering was set to 10−8. The sequence space was visualized and clustered using the Cytoscape software and prefuse force-directed layout (Supplementary Figures S1 and S2) (29).
For each of 56 identified groups (clustered at 70% sequence identity, using CD-HIT), a multiple sequence alignment was generated using Mafft (with the –linsi option) (30). The alignments were used for building HMM profiles using hmmbuild (with default parameters) from the HMMER3 suite (31). The HMM profiles were then compared using hhsearch (with default parameters). Groups sharing high similarity (E-value < 10−9 between their representatives) were combined. The resulting network of 38 groups was visualized via Cytoscape (Figure (Figure22).
The domain architectures of all the sequences were analyzed using hmmscan from the HMMER3 suite (31), run against the Pfam database, with an E-value threshold of 10−5. If two consecutive domains overlapped by >30% of the length of the shorter domain, the domain with lower E-value was retained. Transmembrane regions were detected using the TMHMM package (32). Gene neighbors were determined using the KEGG GENOME database, and their protein products were analyzed with hmmscan, run against the Pfam and CDD databases (E-value < 10−5).
All identified His-Me finger domain structures were clustered at 90% sequence identity and manually superposed on their core structural elements (ββα) using Swiss PDB Viewer (33). The residue correspondence was assessed manually based on visual inspection of the superposition, with respect to their locations in secondary structure elements and contribution to the active site formation.
Representative sequences from structurally uncharacterized families were added to the structural alignment based on Meta-BASIC mappings as well as secondary structure predictions and conservation of key residues predicted to form the catalytic site.
The current release of PfamA (version 31.0) consists of 23 families belonging to the His-Me_finger clan (CL0263), whereas the SCOP database divides His-Me finger structures (SCOP ID: 54060) into six families. Nevertheless, to date, many of these families are poorly characterized, and their role in the cell remains elusive.
In this work, we conducted exhaustive distant homology detection searches, combining a previously established and successful methodology for studying large and diverse protein superfamilies (34,35) with a series of PSI-BLAST searches. We identified over 111 000 proteins in the NCBI NR database containing the His-Me finger fold, which we classified into 38 distinct groups (summarized in Table Table1)1) based on their sequence similarity.
Proteins containing the His-Me finger domain display significant structural diversity, either within the His-Me finger itself, or in the additional structures supporting the catalytic domain. The core β-hairpin is always present, performing a strictly defined function. The hairpin along with the extensive Ω-loop, inserted between the consecutive β-strands, provides catalytic residues and stabilizes the overall structure. The core α-helix exhibits an extraordinarily broad repertoire of variants, given the overall small size of the His-Me finger. For instance, excepting obvious differences in length, this helix can be disrupted by a protruding ‘helical finger loop’ of unknown function, as observed in the NucA (13), Spd1 (15), Ref (2), EndA (36) and Sda1 (37) nucleases. An extreme example is provided by the viral RecA-dependent Ref nuclease, where the core α-helix hosts a helical finger almost twice as long as the helix itself (2). In all these cases, however, the α-helix retains its integrity and orientation relative to the β-hairpin (Figure (Figure3A).3A). It also preserves residues responsible for metal ion coordination. An additional role is assigned to the α-helix in the Holliday structure resolvase, ENDOVII (9), where it is essential for dimerization (Figure (Figure3B),3B), as well as for orienting the dimer towards the junction.
The Ω-loop, being an insertion, exhibits a more obvious variation, from being strongly reduced as in ENDOVII resolvase (9) or Hpy99I restriction endonuclease (7), to becoming very elaborate, carrying additional secondary structure elements, as in e.g. CAD (38), Vvn (21) or endonuclease I (22) (Figure (Figure3C).3C). The Ω-loop may also host residues for coordination of a zinc ion, as observed in I-PpoI homing endonucleases (6) or inactive Smad (39), that additionally stabilizes the fold.
In general, zinc ions play an important role in stabilizing many His-Me finger structures. Apart from binding to the Ω-loop as described above, zinc may also be coordinated within a ‘zinc knuckle’ at the interface between the N-terminal part of the α-helix and an additional β-hairpin structure preceding the His-Me finger. This feature is present in a range of proteins bearing the His-Me finger domain, such as CRISPRs, homing endonucleases, CAD (Caspase-Activated DNase) DNA-degrading nuclease (38), ENDOVII HJC (9) and GmR87 nickase of unknown function (Figure (Figure3D3D).
Stabilization of the His-Me finger core may also be provided by extensive structures. This is characteristic of certain virulence factors—such as Sm (14), NucA (13), Spd1 (15), Sda1 (37) – EndoG (11), EXOG or CPS-6 (12), that have an additional β-sheet at the ‘back’ of HNH structure (Figure (Figure3E).3E). In fact, in these cases, the His-Me finger domain is inserted into a larger β-sheet. Similarly, Vibrio vulnificus nuclease (Vvn) (21) and endonuclease I (22) retain a big α-helical domain located at the C-terminus sheltering the central His-Me finger nuclease (Figure (Figure3C),3C), whereas toxins (e.g. Colicin E9 (40), Pyocin S2 (41)) possess expanded structural decorations at the N-terminus.
The His-Me catalytic site is set up around the axis connecting two key residues: the nearly invariant histidine, located in the first β-strand, and the asparagine/histidine in the N-terminal turn of the α-helix (Figure (Figure3).3). The latter residue directly coordinates a divalent metal ion. In addition, the ion is often coordinated by D/E/H residues preceding the catalytic histidine in the first β-strand, and another polar residue along the α-helix, in the +4 or +8 position with respect to the asparagine/histidine mentioned above (Figure (Figure44).
Based on an analysis of the conservation of the active site residues, most of the identified His-Me finger domains appear to be active nucleases with histidine serving as a general base (Figure (Figure3).3). However, some groups display considerable variation within the predicted active site. Three groups defined in our sequence clustering (i.e. groups 16, 20, 21) lack the catalytic histidine. Structural and biochemical studies suggest that a tyrosine located in the extended Ω-loop in group 20 substitutes for histidine in the catalyzed reaction (Figure (Figure3F)3F) (20). As it was shown for PacI endonuclease, in the one-metal-ion mechanism, the tyrosine side-chain may act as a nucleophile, substituting for water utilized in histidine-dependent nucleases (19). A similar mode of action is typical for HUH endonucleases (42).
The remaining two groups, 16 and 21, lack both histidine and equivalent tyrosine residues, indicating a loss of nuclease function. Group 16 corresponds to the MH1 domain of Smad (dwarfin-A) proteins. According to previous studies (43,44), MH1 is a modified endonuclease that has been recruited as a transcriptional activator. Although it has lost the catalytic activity, it has retained the DNA-binding capability (43). Group 21 contains the predicted ectonucleotide pyrophosphatase/phosphodiesterase-1, -2 and -3 (Enpp1, Enpp2/autotaxin, Enpp3) and venom phosphodiesterases. The C-terminal regions of the proteins contain a nuclease-like His-Me finger domain, which is catalytically inactive and has been suggested to bind the substrate (45).
To perform a systematic, context-independent classification of the His-Me finger superfamily, we clustered the collected sequences (as described in detail in Materials and Methods) and subsequently assigned them to 38 groups, corresponding to clusters of similar sequences (Figure (Figure22).
Out of the 38 defined groups, three (designated here with numbers 20, 23 and 28) are novel, i.e., neither Pfam, CDD nor SCOP contains a corresponding entry. Sequence similarity to already described His-Me fingers and predicted secondary structure suggest that they possess the His-Me finger fold. For group 23 the molecular function remains unknown, whereas the remaining two (20 and 28) have been studied in the context of single proteins.
Additionally, we identified seven Pfam families: DUF1364, RE_Alw26IDE, Tox-HNH-HHH, Tox-HNH-EHHH, Tox-SHH, Tox-GHH and Tox-GHH2, which have not been assigned to the His-Me finger Pfam clan but are predicted to have both the His-Me finger fold and the active site (46–48). DUF1364 clusters with DUF968, together forming group 7, and includes the Escherichia coli prophage protein ybcO of known structure (PDB ID: 3g27). A manual structure superposition to other members of this superfamily unarguably classifies DUF1364 as a His-Me finger domain, which agrees with the hypothesis made previously (46).
Table Table11 gives a complete list of the defined groups, along with their taxonomic distribution, correspondence to solved structures, a short description of their founding members, and their Pfam, COG and KOG assignments. Below, we describe selected groups in more detail (including those that have been poorly characterized or functionally unannotated), proposing their function and highlighting distinctive properties. All the domain architectures (Supplementary Figure S3), genomic neighborhoods (Supplementary Figure S4), sequence logos (Supplementary Figure S5) together with accession numbers of superfamily members (Supplementary Dataset S1) are available as Supplementary Data.
Group 1 embraces the RNA-guided Cas endonuclease from CRISR/Cas9 systems. Cas possesses two well-conserved nuclease domains: RuvC and His-Me finger (HNH). All known Cas9 enzymes contain an HNH domain that cleaves the DNA strand complementary to the guide RNA sequence (target strand), and a RuvC nuclease domain required for cleaving the non-complementary strand (non-target strand), yielding double-strand DNA breaks (49). Group 1 contains also characterized restriction endonucleases, e.g. McrA from E. coli (accession: NP_415677.1) and MnlI (accession: AAU87367.1) which are specific to 5-methylcytosines (50,51). This group also includes vertebrate (including Homo sapiens (accession: NP_115519.2)) ATP-dependent annealing helicase (AH2/ZRANB3). The structure-specific His-Me finger domain of this protein cleaves the D-loop structure and produces an accessible 3′-OH group, which enables DNA polymerase action (52,53). Another important subset of group 1 are the His-Me finger-containing endonucleases from bacteriophages—key components of phage DNA packaging machines (54). Among them is a recently biochemically and structurally studied thermostable nicking endonuclease from Geobacillus virus E2 (PDB ID: 5h0o). It was proposed that an additional α-helix preceding the core β-hairpin may contribute to the enzyme's thermostability by maintaining its structural conformation (55).
This group constitutes a large ensemble of homing endonucleases found in organisms from across the tree of life, including the structurally characterized endonuclease I-HmuI (PDB ID: 1u3e). Other proteins from this group show a domain architecture typical of homing endonucleases, i.e. a number of single or tandem repeats of nuclease-associated modular DNA-binding domains (NUMOD) and other helix-turn-helix-type (HTH-type) domains, along with an Apetala 2 (AP2) domain, which are known to serve as DNA-binding modules conferring substrate specificity (56,57).
Proteins from this group belong to GmrSD restriction systems, which target and cleave DNA with glucosylated hydroxymethylcytosine (HMC) (58). GmrSD consists of GmrS and GmrD subunits, which both exhibit endonuclease activity towards DNA with sugar-modified HMCs (59). In addition, GmrS may also participate in NTP binding and hydrolysis. GmrS possesses a ParB/Srx fold, whereas GmrD has been shown to contain the HNH motif (59). These proteins usually possess additional C-terminal domains—DUF4357, DUF4268 and AlbA_2. The last of these, AlbA_2, belongs to the AlbA superfamily of DNA/RNA-binding domains (60). The molecular function of DUF4357 and DUF4268 remains unknown. According to our sequence-based predictions of their DNA-binding potential using DNABIND server (61), only DUF4268 is likely (with 98% probability) to bind DNA. Apart from REases, group 3 contains bacterial putative extracellular nucleases containing transmembrane helices and a C-terminal Excalibur domain (PF05901), which occurs in calcium-binding nucleases. The presence of close homologs in Agaricomycetes fungi and Agromyces bacteria may suggest a potential horizontal gene transfer between kingdoms.
Genes encoding bacterial proteins from these groups are located in direct genomic proximity to Smi1/Knr4 genes, whose products belong to the SUKH superfamily of immunity proteins. This suggests they might act as nucleic acid-degrading toxins in toxin-antitoxin systems, similarly to contact-dependent growth inhibition (CDI) operons (62). Proteins from groups 4 and 11 follow a widely conserved domain architecture. They contain pre-toxin secretion-related domains (PT-VENN, PT-HINT, RHS, filamentous hemagglutinin repeats), which facilitate export of a toxin from the host cell and eventually release the toxic nuclease domain upon uptake by the target cell. Additional transmembrane elements are presumably required for binding of the extruded toxin to the cell membrane (63). Interestingly, group 4 also contains single representatives from archaea and arthropods, e.g. two from an archaeon Methanosarcina barkeri and one from a wasp Diachasma alloeum, suggesting potential horizontal transfer.
Non-specific endonucleases from this group constitute a broad set of bacterial virulence factors. They target and degrade both double-strand and linear DNA and RNA with no sequence specificity, making them well-tailored for successful competition. Well-studied examples include streptococcal NET-degrading DNA-entry nuclease (EndA) (accession: WP_001036779), streptodornase (Spd1, Sda1) (accession: NP_268944), and nuclease A (NucA) (accession: WP_010999911), which all contain a characteristic DRGH sequence motif preceding the catalytic histidine (35). They have been associated with invasive bacterial infections by degrading chromatin-rich neutrophil extracellular traps (NETs) and enabling Streptococci to overcome the mammalian immune response (15,37,64,65).
Some proteins from this group constitute polymorphic toxins, which are deployed to inhibit the growth of neighboring cells. The group contains Bacillus YxiD toxins from the YxiD-YxxD toxin-antitoxin system which show cytotoxic RNase activity (accession: P42296) (66) and RhsA toxins (accession: WP_013316542). Both proteins are thought to be secreted from the cell in a contact-dependent manner (66,67). However, this hypothesis still needs experimental confirmation.
Interestingly, the His-Me finger domain from this group is also present in some serine proteases, e.g., from Pseudomonas aeruginosa (accession: OKR41748) and other gram-negative species. These proteins are involved in virulence and secreted by the autotransporter pathway (70). Moreover, in the genomic neighborhood of the proteins from this group in the Campylobacter species, we identified genes encoding signal peptidases (accession: WP_002870274). They belong to the MEROPS S24 family (11), which embraces LexA and type I signal peptidases. These proteases cleave away the N-terminal signal peptide from the translocated protein precursor, thereby playing a crucial role in protein transport across membranes (71). Domains from this group are often coupled with peptidase-like domains, such as Trypsin_2 (PF13365), Peptidase_M8 (PF01457), and adhesin Mfa_like_1 (PF13149), which may participate in invading cell adhesion. This may imply that these His-Me finger nucleases are functionally linked to proteases and enhance virulent activity against the invaded host.
The group contains also a large number of eukaryotic members. These include mitochondrial DNA/RNA endonucleases G and endonuclease G-like 1 (EXOG), which participate in chromosomal DNA degradation during apoptosis (68). As an apoptosis-related factor, EXOG is heavily regulated. For instance, the multimerization induced by proteolysis has been shown to trigger a shift from endonuclease to 5′–3′ exonuclease activity in Caenorhabditis elegans EndoG (11).
This group encompasses AlwI, BbrI and HphI restriction endonucleases (68–70). The exact mode of target sequence recognition for these restrictases is unknown. While the N-terminal domain of AlwI (accession: CBL24780) shows similarity to the BpuJI recognition domain (PF11564) (71), the recognition domain of BbrI and HphI could not be predicted reliably. As indicated by site-directed mutagenesis studies, the key residues responsible for substrate recognition lie outside the catalytic His-Me finger domain (70).
Uncharacterized members of this group appear to use various mechanisms to provide fidelity of sequence readout. A few of them have a SAD/SRA domain that recognizes hemi-methylated CpG dinucleotides and other 5mC containing dinucleotides, suggesting that some putative proteins may act as restriction enzymes that target methylated or hemi-methylated DNA rather than a specific nucleotide sequence (63). Interestingly, the DUF3427 domain (PF11907) tends to co-occur with the His-Me finger (e.g., in WP_052015257 and WP_052747245). Genes encoding proteins with DUF3427 together with the His-Me finger at the N- and C-terminus lie in close genomic proximity to genes that code for methyltransferases and ATPases (Supplementary Figure S4), respectively, suggesting a potential role in restriction for proteins containing HNH and DUF3427 domains.
This bacterial group contains multiple putative prophage-encoded proteins, including Qin prophage from Escherichia coli (YdfU) (accession: NP_416078), a phage transcriptional regulator from Arsenophonus nasoniae (accession: CBA76303), and DLP12 prophage (YbcO) of known structure (PDB ID: 3g27). The genomic neighbors of prophage-encoded members include, e.g., crossover junction endodeoxyribonuclease RusA, phage antitermination protein Q, and antirepressor protein, which are involved in transcription and recombination. Consistently, some members of this group have additional domains involved in single-strand DNA annealing during recombination, e.g. ERF (PF04404) and Rad52_Rad22 (PF04098), also of bacteriophage origin (72). Taken together, we hypothesize that the His-Me finger nuclease domain in these proteins acts as a recombinase that participates in DNA repair and replication, similarly to NinG and RecA-dependent endonucleases (73).
Escherichia coli endonuclease I (EndoI) is a bacterial sequence-independent periplasmic or secreted protein. Periplasmic nucleases, including Vibrio vulnificus nuclease (Vvn), EndoI, Dns, and DnsH protect the cell against the uptake of foreign DNA during transformation, which results in a decrease of transformation rate (21,74). These endonucleases are not active under reducing conditions in the cytoplasm, however the presence of disulfide bonds in their structures suggests that their activity is triggered by a transition to an oxidized form upon extrusion from the cell (21,75). Another known member of group 9 is an Mg2+-activated ribonuclease Bsn (accession: WP_009968119), whose role in the cell remains unknown (76).
In proteins from this group, the His-Me finger domain appears together with a wide range of cell-surface associated domains of an immunoglobulin-like fold, among which the most abundant are fibronectin type III (fn3, PF00041), lamin-tail domain (LTD, PF00932), and bacterial Ig-like domain (Big_2, PF02368). Big_2, along with the fibronectin domain, have been identified as virulence factors involved in adhesion of pathogenic bacterial strains to host cells (77). In the genomes of various Enterobacteria species, the genes encoding sprT peptidase-like proteins and RNA methyltransferases are located in direct proximity to the His-Me finger genes, suggesting potential involvement of the His-Me finger domain in the virulence machinery.
Group 16 contains endonuclease VII, i.e., a type II restriction enzyme Hpy99I from the gastric pathogen Helicobacter pylori (7), and phage T4 endonuclease VII, which is a Holliday junction resolvase that cleaves T4 DNA before packaging it into the phage head (9). In some uncharacterized members from Actinobacteria, genes encoding the His-Me finger-containing proteins are frequently found downstream to a prokaryotic ubiquitin-like gene whose protein product contains a Pup_ligase domain, or upstream to the proteasome beta subunit gene (Supplementary Figure S4), suggesting a role of these His-Me finger nucleases in the Pup-proteasome system (78).
This group comprises toxins, such as the Rhs family proteins from bacterial polymorphic toxin systems, whose sequences contain the characteristic GH-E sequence motif (48,63). In direct genomic proximity to genes coding for many members of group 14, from a wide variety of bacterial species, are genes that encode uncharacterized YjbI-like proteins. YjbI consists of multiple pentapeptide repeats of unknown function. According to the SecReT6 database (79), the genes from this group are frequently located in genomic regions occupied by genes encoding the type 6 secretion system (T6SS). The His-Me finger may therefore constitute an alternative toxin effector in these bacterial species. Intriguingly, we could not find any annotated adjacent gene whose product would provide immunity against self-detrimental nuclease activity. This may imply that these nucleases are benign, or have an additional, as yet uncharacterized, function as an antitoxin.
This group contains restriction endonuclease-like proteins from various prokaryotic and eukaryotic organisms, including YisB protein from Bacillus subtilis (accession: NP_388947). In the MutS–MutL–MutH pathway of mismatch repair, in the absence of a MutH nuclease homologue, other nicking endonucleases might be recruited to serve as an analog of MutH. YisB has been proposed as a good candidate to fulfil this role in Bacillus species (80). Further experiments have confirmed the role of YisB in DNA repair (81). Our analysis of the genomic neighborhood of yisB in various Bacillus strains consistently finds adjacent genes involved in the DNA repair processes, such as ATP-dependent helicase/deoxyribonuclease subunits A and B with PDDEXK_1 nuclease domains, and ATP-dependent double-strand DNA exonucleases SbcC and SbcD. Although the role of fungal homologs (mostly from Rhizobiaceae) remains elusive, they may perform an equivalent function.
This assembly contains a multitude of group II intron reverse transcriptases/maturases. They include LtrA protein from Lactococcus lactis (accession: WP_011835237), which exhibits three activities: DNA endonuclease for site-specific cleavage of the DNA target site and initiation of mobility, reverse transcriptase for intron duplication, and maturase that aids splicing (82). In some eukaryotes, mostly fungi and algae, proteins from this group are encoded by mitochondria or plastids.
Group 19 contains proteins from the phage Orf family of recombinases from Listeria phages (accession: CAC96489). Listeria phage members of the Orf family were indicated to be distantly related versions of a canonical Orf representatives and may constitute novel single-strand specific DNases that could have arisen from fusion of an Orf DNA binding domain to a His-Me finger nuclease domain (83). Our analyzes of genomic neighborhood support this hypothesis; the corresponding genes are potentially co-transcribed with genes encoding proteins from the SSB (PF00436) and DnaB_2 (PF07261) Pfam families. Products of these genes are involved in replication initiation, thus, bearing out the role of these His-Me finger proteins in replication.
This predominantly viral group contains few bacterial and archaeal representatives, e.g. PacI rare-cutting restriction endonuclease from Pseudomonas alcaligenes of known structure (PDB ID: 3ldy). Interestingly, this REase appears to lack a coupled methyltransferase, so the protection from self-cleavage stems from the absence of recognized sequences in the P. alcaligenes genome (20). The His-Me finger domain is coupled with an HTH-like domain at the N-terminus which may contribute to DNA-binding specificity. Viral members of group 20 include proteins from giant viruses: Paramecium busaria and Pandoravirus species. It has been reported that the genome of Paramecium busaria chlorella virus 1 encodes multiple site-specific restriction endonucleases of other folds, capable of degrading host chromosome in early stages of infection (77,78). It is therefore tempting to speculate that PacI viral homologs might be other viral restrictases.
This group embraces Enpp1, Enpp2 and Enpp3 transmembrane glycoproteins, which hydrolyze extracellular nucleotide triphosphates to produce pyrophosphates (Enpp1, Enpp3) (84,85) or lipid mediator lysophosphatidic acid (Enpp2/autotaxin) (86). These proteins typically contain four domains: catalytic phosphodiesterase, nuclease-like His-Me finger, and two somatomedin B (SMB)-like domains. Both the catalytic phosphodiesterase and His-Me finger domains contribute to structural stability (84), whereas the His-Me finger additionally participates in substrate binding (45). The nuclease-like domain retains neither the catalytic histidine nor its tyrosine counterpart and is presumed to be catalytically inactive. Interestingly, this cluster also includes proteins that function as venom phosphodiesterases in snakes and a variety of marine species (accession: J3SBP3) (87).
This group contains unannotated bacterial proteins that are encoded in direct proximity to genes coding for MazG family proteins. In E. coli, MazG is a regulator of programmed cell death under conditions of nutritional stress. It possesses nucleotide pyrophosphohydrolase activity, inhibited by MazE–MazF toxin–antitoxin module. Within the module, the MazF toxin shows sequence- and single-strand-specific endoribonuclease activity and targets the cellular mRNA in response to nutrition stress (88). The presence of the His-Me finger-containing protein encoded closely to the MazG-coding gene suggests the involvement of an additional endonuclease domain in this complex regulatory system.
This group comprises small bacterial and phage proteins. It contains Rap (recombination adept with plasmid) proteins (accession: NP_040639), which increase λ-by-plasmid recombination catalyzed in the native host bacteria recombination pathway (73). Rap is a structure-specific endonuclease that targets and nicks D-loops formed by recombination intermediates (89). It has been reported to function as a Holliday junction resolvase on the crossover of DNA substrates (90). Although its crystal structure is not known, our predictions suggest the Rap protein comprises a C-terminal His-Me finger within a full treble clef domain, with an additional zinc finger followed by a predicted HTH-like domain at the N-terminus. Therefore, its specificity towards branched DNA substrates is probably achieved via additional structural elements.
This cluster includes SphI restriction endonuclease (accession: AAB40378), which belongs to the HNH-CIDE toxin family (48). It is hypothesized that the bacterial SphI nuclease domain arose in the milieu of bacterial competition, and has been horizontally transferred to animals (48). An animal CIDE (CAD/DFF40) protein (which belongs to group 35) is involved in DNA fragmentation during apoptosis. We propose that, similarly to PacI, the predicted HTH-like domain preceding the His-Me finger confers specificity to this restriction endonuclease.
Group 32 contains Hcp1 family type VI secretion system effector proteins such as the hypothetical Yhhz from E. coli (accession: NP_417899). Type VI secretion system (T6SS) is used by Gram-negative bacteria to supply effector proteins to eukaryotic cells where they serve as virulence factors, and to neighboring prokaryotic cells to mediate competition (91). The C-terminal region of the effector proteins is usually occupied by domains responsible for effector activity—preferentially peptidoglycan hydrolases and phospholipases, but also endonucleases, including the His-Me finger. It has been suggested that HNH domains can mediate antibacterial toxicity in T6SS (92), indicating that the hypothetical Yhhz protein is, in fact, a His-Me finger nuclease.
The His-Me finger domain is a relatively small structural motif often embedded in the scaffolds formed by large domain architectures in proteins playing roles in various cellular activities. Predominantly, the His-Me finger exhibits endonuclease activity. However, the superfamily also includes a few examples of exonucleases and proteins that have undergone active site deterioration and, consequently, have lost their catalytic potential while retaining the ability to bind DNA. The origin and evolution of the His-Me finger domain remain elusive. Previous studies have suggested that the ancient treble clef fold might have been a common ancestor of HNH proteins (1,93). On the other hand, the presence of this domain in evolutionarily unrelated protein scaffolds, along with its overall low structural complexity, advocates the case for convergent evolution, in which independent evolutionary lines have resulted in a similar catalytic solution. However, it is tempting to speculate that a fold with such a narrow range of functions must have emerged from a single primordial structural motif.
In general, the rigid architecture of His-Me finger fold does not involve any significant conformational change upon binding to the substrate (21,94). Nonetheless, as suggested in (95), in RNA-guided DNA endonuclease Cas9 from Streptococcus pyogenes and Actinomyces naeslundii, the HNH active site in the apo state is disordered and might order upon nucleic acid binding (95,96). The α-helix of the ββα domain is parallel to the DNA minor groove and predominantly contacts the phosphate backbone, which results in a significant bending and widening of the minor groove downstream the cleavage site, as observed in the complex of homing endonuclease I-HmuI with cognate DNA substrate (10,18).
The limited size of the HNH domain is a two-edged sword. On the one hand, its compactness allows for accommodation in multiple architectures and facilitates the secretion of the nuclease effector in various virulent and toxic activities. The small size is also required for homing endonucleases that are limited by mobility constraints (5). On the other hand, however, the domain alone is presumably not able to recognize the substrate specifically and thus requires additional features to gain specificity. The His-Me finger ββα catalytic core interacts with the target nucleic acid backbone to reach the scissile phosphate, therefore, it does not contribute significantly either to sequence-specificity or to the fidelity of these enzymes (10). Moreover, most sequence-specific interactions are thought to occur via the major groove of DNA, which is more accessible for direct readout by additional structural elements associated with the His-Me finger (97,98). In turn, the α-helix of the His-Me finger makes contacts with the DNA backbone from the side of the minor groove, which explains its explicit sequence-independence. Additionally, the domain shows rare contacts with DNA phosphate backbone, which might be another reason for its non-specificity (21,94). In order to achieve site-specific cleavage, His-Me finger enzymes incorporate a broad range of additional DNA-binding domains, such as various versions of HTH domains. Proteins that are highly specific, restriction endonucleases in particular, usually possess at least one tethered DNA recognition domain, in contrast to single-domain colicins or periplasmic nucleases, which cleave the target sequence with little, if any, specificity (21,99).
The His-Me finger specificity has many flavours. Among His-Me finger representatives, one may find both site-specific enzymes, such as I-Hmul, and proteins that show little sequence-specificity, such as colicin DNases. Colicins (e.g. ColE7) exhibit a universal mechanism of action—they cleave both single-strand and double-strand DNA and RNA without apparent base preference (100). On the other hand, some proteins show DNA-structure specificity, e.g. endonuclease VII prefers structurally perturbed branched and mismatched Holliday junctions, and related forms (101), while RecA-dependent nuclease Ref makes targeted double-strand breaks within a displacement loop formed by RecA (102,103).
The His-Me finger domain can be found on two sides of inter- and intra-organismal competition. It acts as a pathogenic module that, by degrading the infected organism's genetic material, leads to cell death. On the other hand, as an anti-pathogenic entity, it can damage the invading DNA and protect the host from the deleterious activity of the invader. Its overall structural compactness and intrinsic non-specificity make it a perfectly-tailored pathogenic module that successfully targets the competing organisms by various modes of action. We identified a number of groups that contain nucleic acid-targeting toxins in toxin–antitoxin systems and restriction endonucleases in restriction-modification systems. As reflected by this work, these systems exhibit rapid evolution and frequent lateral gene transfers. Nevertheless, since the His-Me finger domain itself exhibits low substrate and site specificity, its nuclease activity would be extremely toxic to the cell. To deal with the detrimental activity of His-Me finger endonucleases, various mechanisms have been utilized. One mode of action is enrichment of disulfide bonds in proteins containing the His-Me finger domain that renders the nuclease inactive (104), and allows for the activation of the enzyme upon extrusion from the reducing intracellular environment to the oxidizing extracellular one (105). Along with other secretion-related domains and immunity proteins, the His-Me finger is a part of a cellular secretion machinery able to kill competing strains. In groups that comprise bacterial toxins, we observed multiple pre-toxin domains coupled with the His-Me finger. The pre-toxin domains possess adhesive properties that help in displaying the C-terminal toxin domain on the cell surface and thus secreting the virulence factor.
The authors thank Roman Laskowski for proof-reading the manuscript.
Supplementary Data are available at NAR Online.
Foundation for Polish Science [TEAM to K.G.]; Polish National Science Centre [2011/02/A/NZ2/00014 and 2014/15/B/NZ1/03357 to K.G.]. Funding for open access charge: Polish National Science Centre [2014/15/B/NZ1/03357].
Conflict of interest statement. None declared.