Catalytic domains of Type II restriction endonucleases (REases) belong to a few unrelated three-dimensional folds. While the PD-(D/E)XK fold is most common among these enzymes, crystal structures have been also determined for single representatives of two other folds: PLD (R.BfiI) and half-pipe (R.PabI). Bioinformatics analyses supported by mutagenesis experiments suggested that some REases belong to the HNH fold (e.g. R.KpnI), and that a small group represented by R.Eco29kI belongs to the GIY-YIG fold. However, for a large fraction of REases with known sequences, the three-dimensional fold and the architecture of the active site remain unknown, mostly due to extreme sequence divergence that hampers detection of homology to enzymes with known folds.
R.Hpy188I is a Type II REase with unknown structure. PSI-BLAST searches of the non-redundant protein sequence database reveal only 1 homolog (R.HpyF17I, with nearly identical amino acid sequence and the same DNA sequence specificity). Standard application of state-of-the-art protein fold-recognition methods failed to predict the relationship of R.Hpy188I to proteins with known structure or to other protein families. In order to increase the amount of evolutionary information in the multiple sequence alignment, we have expanded our sequence database searches to include sequences from metagenomics projects. This search resulted in identification of 23 further members of R.Hpy188I family, both from metagenomics and the non-redundant database. Moreover, fold-recognition analysis of the extended R.Hpy188I family revealed its relationship to the GIY-YIG domain and allowed for computational modeling of the R.Hpy188I structure. Analysis of the R.Hpy188I model in the light of sequence conservation among its homologs revealed an unusual variant of the active site, in which the typical Tyr residue of the YIG half-motif had been substituted by a Lys residue. Moreover, some of its homologs have the otherwise invariant Arg residue in a non-homologous position in sequence that nonetheless allows for spatial conservation of the guanidino group potentially involved in phosphate binding.
The present study eliminates a significant "white spot" on the structural map of REases. It also provides important insight into sequence-structure-function relationships in the GIY-YIG nuclease superfamily. Our results reveal that in the case of proteins with no or few detectable homologs in the standard "non-redundant" database, it is useful to expand this database by adding the metagenomic sequences, which may provide evolutionary linkage to detect more remote homologs.