|Home | About | Journals | Submit | Contact Us | Français|
A separate family of enzymes within the metallo-β-lactamase fold comprises several important proteins acting on nucleic acid substrates, involved in DNA repair (Artemis, SNM1 and PSO2) and RNA processing [cleavage and polyadenylation specificity factor (CPSF) subunit]. Proteins of this family, named β-CASP after the names of its representative members, possess specific features relative to those of other metallo-β-lactamases, that are concentrated in the C-terminal part of the domain. In this study, using sensitive methods of sequence analysis, we identified highly conserved amino acids specific to the β-CASP family, some of which were unidentified to date, that are predicted to play critical roles in the enzymatic function. The identification and characterisation of all the extant, detectable β-CASP members within sequence databases and genome data also allowed us to unravel particular sequence features which are likely to be involved in substrate specificity, as well as to describe new but as yet uncharacterised members which may play critical roles in DNA and RNA metabolism.
Metallo-β-lactamase fold proteins constitute a large superfamily of proteins possessing a wide variety of substrates, most of them having in common an ester linkage and a negative charge (1). Besides class B β-lactamases hydrolysing lactams, this superfamily includes among others, glyoxalase II, aryl sulfatases, cytidine monophosphate-N-acetyl neuraminic acid (CMP-NeuAc) hydrolases, cAMP phosphodiesterases and the phnP protein, involved in alkylphosphonate uptake (1–3). A separate group within the metallo-β-lactamase fold superfamily comprises proteins with nucleic acid substrates. Among these, the 73 kDa subunit of cleavage and polyadenylation specificity factor (CPSF) and its yeast orthologue Ysh1p, are involved in RNA processing whereas mouse SNM1 and yeast PSO2 are implicated in DNA crosslink repair (1). Artemis, a novel member of this group, has recently been identified as involved in V(D)J recombination/DNA repair and mutations of which cause human severe combined immune deficiency with increased radiosensitivity (RS-SCID) (4). Following this first characterisation, Artemis was also shown to possess intrinsic single-strand-specific 5′ to 3′ exonuclease activity which is modified to an endonuclease activity when Artemis forms a complex with the DNA-dependent protein kinase DNA-PKcs (5), consistent with its presumed catalytic activity (4).
Metallo-β-lactamase fold consists of a four-layered β-sandwich with two mixed β-sheets flanked by α-helices, with the metal-binding sites located at one edge of the β-sandwich (Fig. (Fig.1)1) (6). The dinuclear Zn(II) centre, used to perform the cleavage reaction, is located at the bottom of a wide shallow groove (Fig. (Fig.1)1) (6). Five sequence motifs, consisting mostly of histidine and aspartic acid residues, are highly conserved in active enzymes of the superfamily and participate in zinc coordination and hydrolysis reaction. The first two motifs are located at the end of two β-strands of the first β-sheet (red and yellow in Fig. Fig.1).1). Motif 2 (yellow in Fig. Fig.1)1) is typical of the entire superfamily and is typified by the highly conserved HxHxDH signature, in which the first histidine and the aspartic acid are invariant. The third and fifth motifs, ending strands of the second β-sheet (pink and green in Fig. Fig.1),1), each contain a conserved histidine whereas the fourth one, also located at the end of an in-between β-strand, contains an acidic residue or a cysteine (violet in Fig. Fig.1).1). However, as noticed by Aravind (1) and Daiyasu et al. (3), the length between motifs 4 and 5 is predicted to be extremely variable, particularly important for metallo-β-lactamases acting on nucleic acids, but these authors disagree on the location of the catalytic histidine of motif 5 within this particular family.
Prompted by our interest in the Artemis function in DNA repair, we undertook a detailed analysis of its sequence C-terminal to motif 4, up to which the similarity with proteins of the metallo-β-lactamase superfamily can be significantly detected against domain databases. Indeed, motif 5 could not be easily identified in Artemis. Instead, this C-terminal sequence shares obvious similarity with yeast PSO2 and mouse SNM1 and, to a lower extent, with the 73 kDa subunit of CPSF (4). The conserved sequences lying after the four typical metallo-β-lactamase motifs (motifs 1–4) therefore constitute a hallmark of proteins of this family specifically acting on nucleic acids. They are not restricted to a limited region, the length of which should correspond to that including the metallo-β-lactamase motif 5, but share features of a distinct globular domain that we named the β-CASP motif, after metallo-β-lactamase-associated CPSF Artemis SNM1/PSO2. Using a combination of profile-based and bidimensional methods of sequence analysis, we highlighted all detectable extant sequences that make part of the ‘β-CASP family’ in the three primary kingdoms (eukaryotes, bacteria and archaea) and found several conserved motifs, some of which are yet undescribed. These highly conserved motifs, including two histidine and an acidic residue, are likely to play a key role in the structure and/or function of this family within the metallo-β-lactamase superfamily. Moreover, specific features of these motifs allow distinguishing between enzymes acting on DNA substrates from those involved in RNA metabolism. The comprehensive sequence analysis presented here is more useful for the characterisation of the β-CASP family functions as sequence divergence hampers its fully automatic description.
Domain databases searches were performed using RPS-BLAST 2.2.1 running at the National Center for Biological Information (NCBI; http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) and a HMM search running at the Sanger Center (http://www.sanger.ac.uk/Software/Pfam/search.shtml).
The non-redundant database (NRDB) at NCBI was searched using PSI-BLAST (7). We also used the bidimensional hydrophobic cluster analysis (HCA) (8,9), which offers the possibility to add information about secondary structures to the lexical analysis of the considered sequences. The sequence is handled on a duplicated α-helical net in which hydrophobic amino acids (V, I, L, F, M, Y, W) are contoured. The defined hydrophobic clusters [i.e. hydrophobic amino acids that are separated from each other by at least four non-hydrophobic residues (connectivity distance linked to the use of a α-helical support)] were shown to mainly correspond to the internal faces of regular secondary structures (α-helices or β-strands) (10). Conservation of hydrophobic cluster features which participate in the protein core, together with a similar texture of sequence similarities, are associated with the maintenance of a similar structure and often help and/or allow the alignment procedure for highly divergent sequences (typically in the 10–20% sequence identity range, below the so-called ‘twilight zone’ of 25–30% identity). The sensitivity of this approach, combined with profile-based lexical tools, has often been successfully used to identify new domains (11,12) and/or to link orphan sequences to particular structural and functional families (13–15).
The Artemis sequence was searched against domain databases using two different programs: HMM search and RPS-BLAST. A metallo-β-lactamase domain (Pfam00753; 16) was highlighted between amino acids 5 and 173 after the HMM search (E-value 0.062) or amino acids 21 and 145 with RPS-BLAST (E-value 8 × 10–5). The alignment performed using RPS-BLAST ends after the conserved Gly–Asp sequence (motif 4) whereas the alignment obtained using the HMM search is much longer. However, in this last case, the conserved histidine residue of the metallo-β-lactamase family (motif 5) is missing in Artemis, being aligned with a phenylalanine, which cannot substitute histidine for zinc binding. Moreover, the C-terminal region of the HMM alignment (amino acids 142–173), after the conserved Gly–Asp sequence (motif 4), shares much less sequence identity/similarity with the Pfam metallo-β-lactamase profile, indicating that this part of the alignment is probably fortuitous. This observation prompted us to analyse further sequences after motif 4, with the particular aim of identifying the missing conserved histidine (motif 5) of the metallo-β-lactamase signature.
Thus, we searched for similarities of the Artemis sequence with other proteins after motif 4 and up to amino acid 370 (ending a globular domain, as indicated after HCA, star in Fig. Fig.2B).2B). Using this query (amino acids 142–370), PSI-BLAST searches against the NRDB (891 607 sequences) at NCBI revealed by iteration 1 significant similarities with several proteins including mouse SNM1. Further iterations highlighted similarities with other proteins, including yeast PSO2 [17 significant matches (E-value <0.002) at convergence by iteration 3]. Marginal but interesting similarities were also observed just above the threshold E-value (E = 1.5) with a hypothetical protein from Sulfolobus solfataricus, described as a putative mRNA 3′-end polyadenylation factor (SSO0761, see Table Table3).3). This similarity was supported at the structural level as a true relationship using HCA (see Materials and Methods for details; Fig. Fig.2B).2B). Moreover, a metallo-β-lactamase signature was also detected just before the similarity region using RPS-BLAST (Fig. (Fig.2A).2A). Thus, the S.solfataricus sequence was included in a new position specific score matrix (PSSM), which was then used for additional iterations. This led to the identification at convergence by iteration 16 of numerous proteins, including various hypothetical prokaryotic proteins, the 73 kDa subunit of CPSF from mammals (iteration 4) and yeast (Ysh1, iteration 5) as well as the 100 kDa subunit of CPSF from mammals and from fission yeast (iteration 8) (Tables (Tables11–3).
The similarities of the detected sequences were confirmed by reciprocal iterative strategies, sometimes extending the detected family to other members, as in the case of the Saccharomyces cerevisiae Cft2/Ydh1 sequence, orthologous to the mammalian CPSF 100 kDa subunits. Most of the matching proteins possess, before the detected regions, sequences belonging to the metallo-β-lactamase superfamily, as detected against the Pfam database (presence of motifs 1–4). Those sequences that do not match the Pfam metallo-β-lactamase profile, however, possess a metallo-β-lactamase fold, as shown by global similarity searches, but have lost some or all of the conserved amino acids of the four consecutive motifs (Tables (Tables11–3). This observation suggests that the considered region, ranging from amino acids 142 to 370 in Artemis and specific to the defined β-CASP family (and for this reason, named the ‘β-CASP’ motif), does not form an independent domain, but rather should be associated with, or even integrated to the metallo-β-lactamase domain for playing a particular role, probably relative to nucleic acids as all of the characterised detected members appear specific for this kind of substrate (see below). The addition of N-terminal sequences to the β-CASP signature of Artemis in a new database search did not highlight new members of this family that we could have missed using the β-CASP motif stricto sensu.
An in-depth analysis of all the identified members of the β-CASP family was then undertaken, in particular with the aim of refining the alignments proposed by PSI-BLAST and of identifying conserved residues that could play a critical role in their function.
HCA was used to manually localise on the bidimensional level (secondary structure context) conserved motifs, as proposed in the PSI-BLAST results (Figs (Figs22 and and33 and Tables Tables11–3). Several major anchor points of the alignment were highlighted; each consisting of conserved hydrophobic residues gathered into a cluster, which represents the internal face of a conserved regular secondary structure, often accompanied (upstream, within and downstream) by identical or highly conserved non-hydrophobic residues. Motif A is characterised by an acidic residue (D or E) after a stretch of hydrophobic residues typical of a β-strand structure [---(D,E)-(T,S)-T, where is a hydrophobic amino acid]. Motif B includes a histidine ending an amphiphilic β-strand structure and followed by a α-helical structure. C-terminal to this last α-helix, and at the end of another predicted β-strand, a conserved histidine (motif C) can also be found in all the sequences of the β-CASP family, with the exception of a few sequences including those of the Artemis/SNM1/PSO2 group in which this histidine is most often substituted by a valine (red circle in Fig. Fig.2B).2B). Other conserved motifs were detected along the compared sequences, but none include highly conserved polar residues.
The three conserved polar amino acids characterising motifs A, B and C are all located at the end of predicted β-strands, like all of the zinc-binding residues of canonical metallo-β-lactamases (Fig. (Fig.1).1). This particular position together with their conservation suggests that they could be located in the vicinity of the metallo-β-lactamase active site and that they could play a specific role in the probable enzymatic function of the defined β-CASP family. Accordingly, mutation of the Asp165 residue (motif A) in Artemis, as well as that of His319 (motif B), strongly compromise both the V(D)J recombinase activity and the in vitro endonuclease activity (5; our unpublished observations).
Motif B histidine is conserved in all members of the β-CASP family, in contrast to motif C histidine which is substituted by a valine in the Artemis/SNM1/PSO2 group. Therefore, it is possible that motif C histidine, which is otherwise conserved for CPSF 73 kDa and almost all of the CPSF 100 kDa (with the exception of yeast Ydh1p), may play an important role in the specificity of members of the β-CASP family towards RNA targets. It is worth noting that it is also conserved for most of the bacterial β-CASP members, suggesting that these as yet uncharacterised proteins could act specifically on RNA as well (Tables (Tables22 and and33).
Interestingly, our database mining also revealed an as yet uncharacterised group of proteins with a non-histidine motif C (Table (Table1),1), which shares similarities but is distinct from Artemis and SNM1 (the human β-CASP sequence of this protein shares 22 and 35% of sequence identity with Artemis and human SNM1, respectively). This protein has been previously named SNM1C (17–19). According to our observations, the SNM1C group can thus be predicted to play a role in DNA processing, which remains to be experimentally investigated.
In some cases, the pairwise similarities proposed in the PSI-BLAST results, although ranging beyond motif 4 of the metallo-β-lactamase signature, were patchy and do not always encompass conserved motifs of the emerging multiple alignment (e.g. in the PSI-BLAST results relative to the Schizosaccharomyces pombe CPSF 100 kDa: motifs B and C are missing, Fig. Fig.2B).2B). This observation suggests that (i) the lexical procedure used by PSI-BLAST for alignment was not sufficient in itself to align highly divergent sequences, (ii) large insertions or deletions could interfere with the recognition of conserved motifs, and/or (iii) motifs could really be absent in some proteins. Thus, HCA was also used to refine the proposed relationships, as illustrated in Figure Figure2B2B with the particular case of proteins of the CPSF 100 kDa family. On the one hand, these alignments do indeed tolerate very large insertions just before motif B, which hampers alignment beyond, at least in the case of the S.pombe CPSF 100 kDa sequence, although clusters typical of motifs B and C can be easily detected farther on. On the other hand, these sequences have lost in part or totally the highly conserved residues of the different motifs, although the global fold of the domain was conserved. The histidine of motif B is indeed not present in any of the three CPSF 100 kDa sequences whereas histidine of motif C is only absent in the yeast sequence (see also Table Table1).1). The absence of critical residues is also found in motifs 1–4 of the metallo-β-lactamase domain preceding the aligned region (Fig. (Fig.2A),2A), reinforcing the hypothesis that conserved residues of motifs A–C, together with motifs 1–4, play a key role in the probable enzymatic function of the β-CASP family. This function should be lost in some proteins of the family, including the CPSF 100 kDa subunits. The missing motifs, highlighted using HCA, were confirmed as true relationships using them as queries in PSI-BLAST searches in a ‘reverse’ strategy.
Our initial aim was to identify within the β-CASP family a highly conserved amino acid, which could be located at a similar position relative to the active site than that of the motif 5 histidine of canonical metallo-β-lactamases (Fig. (Fig.1).1). As stated above, we identified not one but three highly conserved amino acids, suggesting that the active site of members of the β-CASP family could accommodate more critical residues than those of canonical metallo-β-lactamases.
Histidine of motif B appears to be the best candidate to correspond to motif 5 histidine of canonical metallo-β-lactamases, as already noticed by Aravind (1), since it is conserved in all of the members of the β-CASP family, in contrast to motif C histidine. If the sought-after motif 5 histidine actually corresponds to the highlighted motif B histidine, sequences intervening between motifs 4 and 5 should correspond to a large insertion within the metallo-β-lactamase domain. Another possibility is that no large insertion should occur between metallo-β-lactamase motifs 4 and 5 and that motif 5 should actually correspond to the highlighted motif A. Both hypotheses are in fact supported by the loss of Artemis activity in Asp165 and His319 mutants (5; our unpublished observations). Like motif 5, corresponding to strand β12 and the subsequent loop of the β-lactamase structure shown in Figure Figure1,1, motif A also corresponds to a β-strand ended by a conserved polar residue. Moreover, like motif 5 in canonical metallo-β-lactamase structures, motif A is separated from motif 4 by a short peptide including a helix (Figs (Figs2B2B and and3;3; helix α4 in Fig. Fig.1).1). Regarding this hypothesis, the canonical motif 5 histidine should, however, be substituted by an acidic residue and sequences located after motif A should correspond to a distinct additional domain accompanying the metallo-β-lactamase domain. This hypothesis is further supported by a comparison of the three-dimensional structure of canonical metallo-β-lactamases with that of glyoxalase II (20), in which the strand ended by motif 5 histidine is followed by a distinct globular domain, differing from the C-terminal helix of metallo-β-lactamases. Regarding this possibility, it could be hypothesised that substrates of the β-CASP family bind at the domain interface, as substrates of glyoxalase II do.
The β-CASP family described here is a very large family including proteins of the three primary kingdoms (eukaryotes, bacteria and archaea) and appears to be specialised towards nucleic acids, as suggested by functional data gained for some members or by domains accompanying the metallo-β-lactamase domain. Some archaeal ORFs indeed possess N-terminal KH domains (1), known as RNA-binding motifs (21), whereas an Arabidopsis ORF of the Artemis/SNM1/PSO2 group has a module homologous to the eukaryotic DNA ligase I downstream of the metallo-β-lactamase/β-CASP domain (Tables (Tables11 and and33).
On the basis of an in-depth sequence analysis, we showed that members of the β-CASP family that specifically interact with DNA targets can be distinguished from those involved in RNA metabolism regarding the nature of a particular amino acid included in a conserved sequence motif (motif C), which is always a histidine in RNA-specific proteins, whereas it is substituted by a hydrophobic amino acid in proteins acting on DNA. This distinctive sequence feature can usefully be considered for functional characterisation in wide-scale genome analyses.
The recently identified Artemis protein makes part of the DNA double-strand break (DSB) repair machinery, as inferred from the phenotype of patients with RS-SCID who possess defects in V(D)J recombination leading to an early arrest of B- and T-cell maturation (4). Given the potential enzymatic function of its metallo-β-lactamase/β-CASP domain, it has been hypothesised that Artemis could be involved in the opening of hairpin-sealed coding ends, as generated by the RAG1/RAG2 complex (4). Hence, Artemis could be integrated in the DNA non-homologous end joining cascade, in addition to the Ku70/Ku80 complex, DNA-PKcs subunit and XRCC4/DNA ligase 4. Recently, Ma et al. (5) demonstrated that Artemis do indeed possess an hydrolase catalytic activity. Moreover, when complexed to, and phosphorylated by, the DNA-PKcs, Artemis is capable of opening and processing hairpin structures generated by Rag1 and Rag2. This activity is strictly dependent on the continuous association of Artemis with DNA-PKcs. The metallo-β-lactamase/β-CASP domain of Artemis is located N-terminus and is followed by a large, essentially non-globular domain which does not share any obvious similarity with other proteins. In contrast, SNM1 and PSO2 have a C-terminal metallo-β-lactamase/β-CASP domain that is preceded by a large N-terminal sequence including a zinc-finger domain. SNM1 and PSO2 are also involved in DNA repair, but they are specialised in DNA interstrand crosslink repair and are not sensitive to ionising radiations (17,18). Here again, their precise role is not yet known, but it could be hypothesised that their possible enzymatic activity could imply DNA cleavage to help remove the crosslink.
As mentioned in the Results, we found evidence in the human genome of a third sequence [named SNM1C by Dronkert et al. (17,18) and Wood (19)] which, like Artemis and SNM1, possesses the hallmark of an enzyme with a DNA substrate (a valine at the motif C position) (Table (Table1).1). The corresponding uncharacterised protein may thus play an important role in DNA repair, which has to be yet uncovered. This SNM1C protein is conserved in human, mouse, Caenorhabditis elegans and Arabidopsis thaliana but is apparently absent from yeast, suggesting a metazoan-specific activity.
Predicted active members of the β-CASP family also include the 73 kDa subunit of the mammalian CPSF and its yeast orthologue, Ysh1/Brr5. Mammalian CPSF, composed of four subunits (30, 73, 100 and 160 kDa), plays a central role in the endonucleolytic cleavage and the polyadenylation of 3′ ends of most eukaryotic messenger RNAs (mRNAs) (22,23). It recognises AAUAAA hexanucleotides found upstream of the polyadenylation site via the 160 kDa subunit. The exact role of CPSF 73 kDa/Ysh1 in the mRNA processing remains largely unknown. Consistent with its predicted ‘active’ metallo-β-lactamase/β-CASP domain, it could also be directly involved in an enzymatic function. This hypothesis is supported by a Brr5/Ysh1 mutant identified in a screen for cold-sensitive pre-mRNA splicing mutants (24). Moreover, depletion of Bbr5/Ysh1 resulted in inhibition of both cleavage and polyadenylation (24). The two CPSF subunits, CPSF 73 kDa/Ysh1 and CPSF 100 kDa/Ydh1, may have evolved from a common ancestor, as suggested by the similarities that these proteins share within their metallo-β-lactamase/β-CASP domain. However, CPSF 100 kDa/Ydh1 are predicted to be inactive as they lack part of all the conserved amino acids that should be involved in the enzymatic function. This ‘loss of function’ is particularly marked for Ydh1. Accordingly, CPSF 100 kDa/Ydh1 could be confined to a modulatory function helping in regulating enzymatic activity, as already suggested by Aravind (1). Acquisition of new functions beyond the ancestral enzymatic one is also possible (1).
Other pairs of active/inactive metallo-β-lactamase/β-CASP domains are encountered in some bacterial genomes, such as those of Mycoplasma genitalium (MG139 and MG423) and M.pneumoniae (MPN280 and MPN261), Staphylococcus aureus (SA0940 and SA1118), Lactobacillus lactis (yciH and yqga), Deinococcus radiodurans (DRA0069 and DR2417m) and Streptococcus pyogenes (Spy1876 and Spy1020) (Table (Table2).2). These pairs of paralogous sequences, one active and the other inactive, can thus be good candidates to constitute the bacterial CPSF 73 kDa/100 kDa subunits. It is worth noting that all the bacterial and archaeal sequences described here seem to be specific to RNA targets, as they are more related to the CPSF 73 kDa proteins than to any other eukaryotic β-CASP proteins, and as they possess (at least those that are predicted to be active) a motif C histidine. According to Anantharaman et al. (25), the last universal common ancestor (LUCA) had probably a polyadenylation system that includes at least a CPSF 73 kDa-like enzyme that cleaves transcripts. However, the involvement of the described bacterial metallo-β-lactamase/β-CASP proteins in mRNA processing remains to be investigated.
Interestingly, similar to the way in which we highlighted a novel group of proteins distinct from Artemis and SNM1 which may act on DNA substrates, we also identified CPSF 73 kDa-related sequences (Table (Table1),1), which are related to, but distinct from CPSF 73 kDa. Thus, these proteins could play a pivotal role in mRNA processing, possibly within or in concert with the CPSF. Their exact role also remains to be unravelled.
In conclusion, the ubiquitous distribution of metallo-β-lactamases of the β-CASP family underlines its importance and the analysis presented here should help to elucidate their exact functions relative to nucleic acids, as well as their specificities.
This work was supported by institutional grants and grants from Association de Recherche sur le Cancer (ARC) and APEX (INSERM) to J.-P.V. and from the CNRS-INRA-INRIA-INSERM ‘Action Bioinformatique’ to I.C.