The Artemis sequence was searched against domain databases using two different programs: HMM search and RPS-BLAST. A metallo-β-lactamase domain (Pfam00753; 16
) was highlighted between amino acids 5 and 173 after the HMM search (E
-value 0.062) or amino acids 21 and 145 with RPS-BLAST (E
-value 8 × 10–5
). The alignment performed using RPS-BLAST ends after the conserved Gly–Asp sequence (motif 4) whereas the alignment obtained using the HMM search is much longer. However, in this last case, the conserved histidine residue of the metallo-β-lactamase family (motif 5) is missing in Artemis, being aligned with a phenylalanine, which cannot substitute histidine for zinc binding. Moreover, the C-terminal region of the HMM alignment (amino acids 142–173), after the conserved Gly–Asp sequence (motif 4), shares much less sequence identity/similarity with the Pfam metallo-β-lactamase profile, indicating that this part of the alignment is probably fortuitous. This observation prompted us to analyse further sequences after motif 4, with the particular aim of identifying the missing conserved histidine (motif 5) of the metallo-β-lactamase signature.
Identification of the members of the β-CASP family
Thus, we searched for similarities of the Artemis sequence with other proteins after motif 4 and up to amino acid 370 (ending a globular domain, as indicated after HCA, star in Fig. B). Using this query (amino acids 142–370), PSI-BLAST searches against the NRDB (891 607 sequences) at NCBI revealed by iteration 1 significant similarities with several proteins including mouse SNM1. Further iterations highlighted similarities with other proteins, including yeast PSO2 [17 significant matches (E-value <0.002) at convergence by iteration 3]. Marginal but interesting similarities were also observed just above the threshold E-value (E = 1.5) with a hypothetical protein from Sulfolobus solfataricus, described as a putative mRNA 3′-end polyadenylation factor (SSO0761, see Table ). This similarity was supported at the structural level as a true relationship using HCA (see Materials and Methods for details; Fig. B). Moreover, a metallo-β-lactamase signature was also detected just before the similarity region using RPS-BLAST (Fig. A). Thus, the S.solfataricus sequence was included in a new position specific score matrix (PSSM), which was then used for additional iterations. This led to the identification at convergence by iteration 16 of numerous proteins, including various hypothetical prokaryotic proteins, the 73 kDa subunit of CPSF from mammals (iteration 4) and yeast (Ysh1, iteration 5) as well as the 100 kDa subunit of CPSF from mammals and from fission yeast (iteration 8) (Tables –).
Figure 2 (Opposite and above) Comparison of the HCA plots of several members of the β-CASP family. (A) The first four metallo-β-lactamase conserved motifs; (B) the β-CASP region. The interest of using HCA for sequence comparison, especially (more ...)
Members of the βCASP family: archaeaa
Members of the βCASP family: eukaryotes
The similarities of the detected sequences were confirmed by reciprocal iterative strategies, sometimes extending the detected family to other members, as in the case of the Saccharomyces cerevisiae Cft2/Ydh1 sequence, orthologous to the mammalian CPSF 100 kDa subunits. Most of the matching proteins possess, before the detected regions, sequences belonging to the metallo-β-lactamase superfamily, as detected against the Pfam database (presence of motifs 1–4). Those sequences that do not match the Pfam metallo-β-lactamase profile, however, possess a metallo-β-lactamase fold, as shown by global similarity searches, but have lost some or all of the conserved amino acids of the four consecutive motifs (Tables –). This observation suggests that the considered region, ranging from amino acids 142 to 370 in Artemis and specific to the defined β-CASP family (and for this reason, named the ‘β-CASP’ motif), does not form an independent domain, but rather should be associated with, or even integrated to the metallo-β-lactamase domain for playing a particular role, probably relative to nucleic acids as all of the characterised detected members appear specific for this kind of substrate (see below). The addition of N-terminal sequences to the β-CASP signature of Artemis in a new database search did not highlight new members of this family that we could have missed using the β-CASP motif stricto sensu.
Identification of conserved amino acids within the β-CASP motif
An in-depth analysis of all the identified members of the β-CASP family was then undertaken, in particular with the aim of refining the alignments proposed by PSI-BLAST and of identifying conserved residues that could play a critical role in their function.
HCA was used to manually localise on the bidimensional level (secondary structure context) conserved motifs, as proposed in the PSI-BLAST results (Figs and and Tables –). Several major anchor points of the alignment were highlighted; each consisting of conserved hydrophobic residues gathered into a cluster, which represents the internal face of a conserved regular secondary structure, often accompanied (upstream, within and downstream) by identical or highly conserved non-hydrophobic residues. Motif A is characterised by an acidic residue (D or E) after a stretch of hydrophobic residues typical of a β-strand structure [
is a hydrophobic amino acid]. Motif B includes a histidine ending an amphiphilic β-strand structure and followed by a α-helical structure. C-terminal to this last α-helix, and at the end of another predicted β-strand, a conserved histidine (motif C) can also be found in all the sequences of the β-CASP family, with the exception of a few sequences including those of the Artemis/SNM1/PSO2 group in which this histidine is most often substituted by a valine (red circle in Fig. B). Other conserved motifs were detected along the compared sequences, but none include highly conserved polar residues.
Figure 3 Multiple alignment of conserved motifs of representative members of the β-CASP family. The alignment is divided into two main blocks, the first one including metallo-β-lactamase motif 4 (with the conserved aspartic acid in red) and the (more ...)
The three conserved polar amino acids characterising motifs A, B and C are all located at the end of predicted β-strands, like all of the zinc-binding residues of canonical metallo-β-lactamases (Fig. ). This particular position together with their conservation suggests that they could be located in the vicinity of the metallo-β-lactamase active site and that they could play a specific role in the probable enzymatic function of the defined β-CASP family. Accordingly, mutation of the Asp165 residue (motif A) in Artemis, as well as that of His319 (motif B), strongly compromise both the V(D)J recombinase activity and the in vitro
endonuclease activity (5
; our unpublished observations).
Motif B histidine is conserved in all members of the β-CASP family, in contrast to motif C histidine which is substituted by a valine in the Artemis/SNM1/PSO2 group. Therefore, it is possible that motif C histidine, which is otherwise conserved for CPSF 73 kDa and almost all of the CPSF 100 kDa (with the exception of yeast Ydh1p), may play an important role in the specificity of members of the β-CASP family towards RNA targets. It is worth noting that it is also conserved for most of the bacterial β-CASP members, suggesting that these as yet uncharacterised proteins could act specifically on RNA as well (Tables and ).
Members of the βCASP family: bacteriaa
Interestingly, our database mining also revealed an as yet uncharacterised group of proteins with a non-histidine motif C (Table ), which shares similarities but is distinct from Artemis and SNM1 (the human β-CASP sequence of this protein shares 22 and 35% of sequence identity with Artemis and human SNM1, respectively). This protein has been previously named SNM1C (17
). According to our observations, the SNM1C group can thus be predicted to play a role in DNA processing, which remains to be experimentally investigated.
Refinement of alignments using HCA—the particular case of predicted inactive members of the β-CASP family
In some cases, the pairwise similarities proposed in the PSI-BLAST results, although ranging beyond motif 4 of the metallo-β-lactamase signature, were patchy and do not always encompass conserved motifs of the emerging multiple alignment (e.g. in the PSI-BLAST results relative to the Schizosaccharomyces pombe CPSF 100 kDa: motifs B and C are missing, Fig. B). This observation suggests that (i) the lexical procedure used by PSI-BLAST for alignment was not sufficient in itself to align highly divergent sequences, (ii) large insertions or deletions could interfere with the recognition of conserved motifs, and/or (iii) motifs could really be absent in some proteins. Thus, HCA was also used to refine the proposed relationships, as illustrated in Figure B with the particular case of proteins of the CPSF 100 kDa family. On the one hand, these alignments do indeed tolerate very large insertions just before motif B, which hampers alignment beyond, at least in the case of the S.pombe CPSF 100 kDa sequence, although clusters typical of motifs B and C can be easily detected farther on. On the other hand, these sequences have lost in part or totally the highly conserved residues of the different motifs, although the global fold of the domain was conserved. The histidine of motif B is indeed not present in any of the three CPSF 100 kDa sequences whereas histidine of motif C is only absent in the yeast sequence (see also Table ). The absence of critical residues is also found in motifs 1–4 of the metallo-β-lactamase domain preceding the aligned region (Fig. A), reinforcing the hypothesis that conserved residues of motifs A–C, together with motifs 1–4, play a key role in the probable enzymatic function of the β-CASP family. This function should be lost in some proteins of the family, including the CPSF 100 kDa subunits. The missing motifs, highlighted using HCA, were confirmed as true relationships using them as queries in PSI-BLAST searches in a ‘reverse’ strategy.
Some predicted ‘non-catalytic’ members of the β-CASP family, in particular missing motifs A and B, were also highlighted in bacterial genomes (Tables and ).
Conserved motifs specific to the β-CASP family relative to the metallo-β-lactamase active site
Our initial aim was to identify within the β-CASP family a highly conserved amino acid, which could be located at a similar position relative to the active site than that of the motif 5 histidine of canonical metallo-β-lactamases (Fig. ). As stated above, we identified not one but three highly conserved amino acids, suggesting that the active site of members of the β-CASP family could accommodate more critical residues than those of canonical metallo-β-lactamases.
Histidine of motif B appears to be the best candidate to correspond to motif 5 histidine of canonical metallo-β-lactamases, as already noticed by Aravind (1
), since it is conserved in all of the members of the β-CASP family, in contrast to motif C histidine. If the sought-after motif 5 histidine actually corresponds to the highlighted motif B histidine, sequences intervening between motifs 4 and 5 should correspond to a large insertion within the metallo-β-lactamase domain. Another possibility is that no large insertion should occur between metallo-β-lactamase motifs 4 and 5 and that motif 5 should actually correspond to the highlighted motif A. Both hypotheses are in fact supported by the loss of Artemis activity in Asp165 and His319 mutants (5
; our unpublished observations). Like motif 5, corresponding to strand β12 and the subsequent loop of the β-lactamase structure shown in Figure , motif A also corresponds to a β-strand ended by a conserved polar residue. Moreover, like motif 5 in canonical metallo-β-lactamase structures, motif A is separated from motif 4 by a short peptide including a helix (Figs B and ; helix α4 in Fig. ). Regarding this hypothesis, the canonical motif 5 histidine should, however, be substituted by an acidic residue and sequences located after motif A should correspond to a distinct additional domain accompanying the metallo-β-lactamase domain. This hypothesis is further supported by a comparison of the three-dimensional structure of canonical metallo-β-lactamases with that of glyoxalase II (20
), in which the strand ended by motif 5 histidine is followed by a distinct globular domain, differing from the C-terminal helix of metallo-β-lactamases. Regarding this possibility, it could be hypothesised that substrates of the β-CASP family bind at the domain interface, as substrates of glyoxalase II do.