Unification of the RelE and ParE families and identification of new related families of proteins
As Escherichia coli
RelE and its close relatives are amongst the functionally best-characterized toxins of the PSK systems, with a wide phyletic pattern in bacteria and archaea [6
], we chose them as the starting point of our investigation of the general cellular functions and natural history of these systems. In order to determine the deep evolutionary affinities of the RelE proteins, we initiated a sequence profile search of the non-redundant (NR) protein database (National Center for Biotechnology Information, Bethesda, USA) using the PSI-BLAST program (threshold for inclusion in profile = 0.01, iterated till convergence) [11
]. At convergence, this search recovered a large number of homologs of RelE-including all the previously described versions - from a variety of bacteria and archaea. We selected distinct representatives from the newly-detected members and transitively searched the NR database with these proteins as queries. As these proteins are typically small (85-110 residues in length) and divergent, several searches initiated with different seed sequences were required to exhaustively identify distant homologs of RelE. For example, RelE (gi: 16129522, E. coli
) recovers a Staphylococcus aureus
protein (gi: 15925446, ortholog of E. coli
YoeB) in the third iteration (e= 6e-04), a Campylobacter fetus
protein (gi: 28974229, ortholog of E. coli
YafQ) in the fourth iteration (e = 2e-04), a Microbulbifer degradans
protein (gi: 23028223, ParE family) in the fourth iteration (e = 0.004) and a Magnetococcus
protein (gi: 23001539, with the RelE-related segment fused to a SF-I helicase module) in the fifth iteration (e = 0.001). To further ensure the detection of highly divergent members, all unique members detected in these searches were included in a single PSI-BLAST PSSM that was used to iteratively search the NR database till convergence. As result of this procedure, we were able to recover over 150 distinct homologs (less than 92% identical) of RelE. Reciprocal searches started with diverse proteins detected in the above procedure recovered a common set of obvious RelE-related 'intermediate' sequences supporting these relationships. For example, a reciprocal search with a protein from Bacteroides thetaiotaomicron
(gi: 29350140), which is consistently recovered from various starting sequences that were detected in the above searches, recovers other divergent RelE-related proteins (for example, Nostoc punctiforme
protein gi: 23129164) in the third iteration (e= 0.001) and the E. coli
RelE itself in the fifth iteration (e = 3e-06). These sequences were then clustered using the BLASTCLUST program and individual clusters were aligned using the T_coffee program [20
]. These alignments were used to predict individually the secondary structure for each of these clusters with the PHD program [21
]. A very similar arrangement of the predicted secondary structure elements between diverse groups of these proteins further reinforced their relationships.
A striking aspect of these searches was the establishment of the relationship between the ParE (typified by the plasmid RK2-encoded toxin, ParE) [22
] and RelE families of toxins that were previously believed to be unrelated. These toxins have very different targets of action: ParE acts at the level of DNA replication and recombination by interfering with the action of gyrase [7
], whereas RelE acts on RNA at the level of translation [5
]. This observation suggested that despite a common origin and significant sequence similarity, these PSK toxins could have diverged into different functional roles. Hereinafter, we refer to this superfamily of proteins, which includes the toxin families defined by RelE, ParE and other evolutionarily-related proteins that were detected in the above searches, as the RelE/ParE superfamily. The majority of proteins in this superfamily are of similar length and appear to fold into a single globular domain.
A multiple sequence alignment of the entire RelE/ParE superfamily (Figure ) was constructed by combining the alignments of the individual clusters using the Profile Consistency Multiple Sequence Alignment (PCMA) program and refining it based on PSI-BLAST pair-wise alignments and secondary structure predictions. The predicted secondary structure which is conserved throughout this superfamily defines an α + β fold with a single amino-terminal strand, followed by a bi-helical hairpin and at least three strong strands at the carboxyl terminus. This secondary structure pattern does not appear to be consistent with that of the MazF/Kid/CcdB superfamily of toxins [23
], which adopts a SH3 barrel fold. Furthermore, no statistically significant relationship can be established between the profiles of the MazF/Kid/CcdB superfamily toxins and the RelE/ParE superfamily. Hence, even though both CcdB and ParE function as gyrase inhibitors, they are likely to fold into very distinct three-dimensional structures.
Figure 1 Multiple alignment of the RelE/ParE superfamily. Multiple sequence alignments of the different families of RelE/ParE were constructed using T-Coffee  and PCMA  after parsing high-scoring pairs from PSI-BLAST search results. The PHD-secondary structure (more ...)
The multiple alignment of the RelE/ParE family shows that much of the conservation is associated with the residues forming the core of the conserved, predicted secondary structure elements (Figure ). Two charged or polar residues, one associated with the first conserved helix and the second associated with the end of the carboxy-terminal-most strand, are also strongly conserved throughout the superfamily. A third, slightly less conserved polar residue is also seen to be associated with the second universally predicted strand of these proteins. This conservation of a charged residue is consistent with the nucleic acid-associated role of the functionally characterized proteins of this family, and could mediate interactions with RNA or DNA. However, beyond this general similarity, the ParE and RelE proteins have very different modes of action. Experimental studies have suggested that ParE inhibits the gyrase by trapping it with DNA in a stable complex, but so far there has been no report of any catalytic activity in ParE. In contrast, RelE and its homologs have been shown to cleave mRNA only when it is associated with the ribosome, but not free mRNAs [5
]. This suggests that certain members of this superfamily may possess catalytic activity under certain circumstances, and the conserved polar residues could contribute to this activity. In particular, the charged residue, which occurs at the carboxyl terminus of the last strand in these proteins, is an attractive candidate for a potential catalytic residue in the RelE proteins. In light of the relationship between the ParE and RelE families of proteins it would be of some interest to investigate the possibility of an unexplored DNA-cleaving activity in members of the ParE family, analogous to the ribosome-associated RNAse activity of RelE.
We then investigated the evolutionary history of the RelE/ParE superfamily by exploring its phyletic and phylogenetic diversity. The superfamily is widely distributed in the currently sequenced prokaryotes: at least a single member is encoded by the chromosome or one of the large genomic partitions in several bacterial and most archaeal lineages (Figure ). Additionally, plasmids, particularly those from proteobacteria, encode their own RelE/ParE-related proteins. However, no members of this superfamily could be detected in eukaryotes. This phyletic pattern could mean that the superfamily had its origin early the evolution of one of the prokaryotic lineages, followed by dissemination via plasmids. However, it is also possible that at least one representative of this superfamily was present in the last universal common ancestor and secondarily lost in the eukaryotes.
Figure 2 Relative abundance of some major families of toxins, associated transcription factors (antitoxins) and the UMA2 superfamily in various genomes. The number of proteins containing PIN, RelE/ParE, Doc, Phd/YefM, AbrB, MazF/CcdB/KiD, Rv0623, AF0319 and AF0608 (more ...)
We determined the major lineages of the RelE/ParE superfamily through single-linkage clustering of the proteins with the BLASTCLUST program and construction of neighbor-joining phylogenetic trees with a multiple alignment of all complete members of the superfamily. Several distinct families could be delineated within the superfamily, of which two of the largest families were the RelE and ParE families. Most of the families could be distinguished by means of certain lineage-specific conserved residues (Figure ). As previously noted [6
], the RelE family had the widest phyletic spread with members in several bacterial and archaeal lineages. A small, proteobacteria-specific family, typified by the YafQ protein of E. coli
(YafQ family), is the one that is most closely related to the RelE family. These two families are unified by the presence of shared polar residue at the beginning of strand 1 (Figure ). The ParE family is restricted to bacteria, but is widely distributed with representatives in proteobacteria, cyanobacteria, actinomycetes and cytophagales. The ParE family is distinguished by the presence of a single polar residue in the second conserved strand, whereas all other members of the RelE/ParE superfamily possess two conserved polar residues in this strand (Figure ). Two of the remaining families, one typified by the protein Rv3182 from Mycobacterium tuberculosis
(Rv3182 family) and one defined by the YoeB protein of E. coli
(YoeB family), are fairly widespread across a range of bacterial lineages (Figure ) and are primarily encoded by the main chromosome. The remaining smaller families are far more sporadic in their distribution, and chiefly occur only in proteobacteria and cyanobacteria (Figure ). One of the most divergent families of the ParE-RelE superfamily is typified by the Z5902 protein (Z5902 family) from the enterohemorrhagic strain of E. coli
(O157:H7), and has sporadic representatives from a number of unrelated bacteria such as Magnetobacterium
. All members of this family occur fused to a carboxy-terminal superfamily I (SF-I) helicase module, and represent one of the rare instances when the ParE-RelE domain occurs in a multidomain protein (Figure ).
Figure 3 Contextual information and an ordered graph of gene neighborhood and domain architectures of the PSK network. The top panel shows the gene neighborhoods (predicted operons) for some of the PSK systems and other relevant gene clusters. The arrows indicate (more ...)
Wider phyletic spread of the RelE family and its relatives, as compared to the ParE family, may suggest that the former group represents the more ancient member of the superfamily, with the ParE lineage being secondarily derived in bacteria. This would imply that the RNA-cleaving activity is likely to be the primitive function of this superfamily, with a secondary innovation of gyrase inhibitor activity in the ParE family. The sporadic, but widespread phyletic patterns of several families, and differences in representation between strains of the same species (for example, E. coli), suggest a potential role for lateral transfer in the spread of these genes. At the same time, the extensive occurrence of genes for this superfamily in the chromosomal partitions of the genomes, and not merely on plasmids, supports the proposal that they may be widely used as cellular regulators. Thus, the acquisition of members of the RelE/ParE superfamily through lateral transfer could be a means by which certain strains could rapidly evolve a new regulatory pathway that helps in adapting their gene expression to unique environmental stresses.
Gene-neighborhood analysis of the RelE/ParE superfamily and identification of PSK-like systems encoding PilT-N terminal (PIN) domain proteins
Given the tight coupling of the toxin-antitoxin gene pairs, we investigated contextual information derived from their gene neighborhoods [14
]. We concentrated on the newly identified members of the RelE/ParE superfamily to glean previously unknown contextual connections to other genes. Upstream genes encoding transcription factors of the MetJ/Arc superfamily accompany both RelE and ParE families [6
]. This transcription factor serves as the antitoxin, which not only regulates the transcription of genes in the T-A operon, but also physically binds to the toxins and counters their actions [6
]. A systematic survey of all the newly identified members of the RelE, YafQ and ParE families showed that the majority of the genes encoding these proteins were associated with upstream genes for MetJ/Arc transcription factors (Figures ,). In contrast, a range of novel gene neighborhood associations was observed in several of the newly identified families of the RelE/ParE superfamily.
Genes for proteins belonging to the YoeB family of the RelE/ParE superfamily were consistently associated with upstream genes that coded small proteins (~75-90 residues) that were unrelated to the MetJ/Arc superfamily. We investigated this family of small proteins further by initiating iterative PSI-BLAST searches seeded with the E. coli
YefM protein, which is their archetypal representative. These searches showed that they formed a group of bacterial and phage proteins that included the previous characterized DNA-binding proteins, like Phd from phage P1 and DnaT [29
]. Reciprocal searches initiated with the Phd protein recovered YefM and those of its relatives that are encoded by genes co-occurring with genes for the YoeB family of RelE/ParE related toxin homologs. Hereinafter, we refer to these proteins as the Phd/YefM superfamily. The Phd/YefM superfamily is characterized by a conserved domain that is approximately 70 to 75 residues in length. This domain is predicted to bind DNA based on the experimental studies on the phage P1 Phd protein and the E. coli
DnaT protein, which functions in DNA replication [29
]. Secondary structure prediction based on the multiple sequence alignment (Figure ) revealed that the DNA-binding domain of the Phd/YefM superfamily is likely to adopt an α + β fold with amino- and carboxy-terminal helices flanking a central β-hairpin. This secondary structure pattern does not suggest any direct relationship to the MetJ/Arc or HTH folds, suggesting that the Phd/YefM domain may define a unique DNA-binding fold. The Phd protein is a transcription regulator of the toxin Doc, and functions as the antitoxin of the phage P1 plasmid PSK system. Though the Phd-Doc PSK system is functionally analogous to the RelE/ParE systems, the toxin Doc is unrelated to the RelE/ParE superfamily (see below). However, based on the organization of the gene neighborhoods in YoeB family (Figure ), the Phd/YefM proteins encoded by the upstream genes are predicted to function as transcriptional regulators and antitoxins of the YoeB proteins. Interestingly, the Phd/YefM domain is also fused to the MinD ATPase in Deinococcus
that is involved in chromosomal partitioning in bacteria, and domains of the Uma2 superfamily in Desulfitobacterium
(Figure ). Given that the Phd/YefM domain is a DNA-binding domain, it is possible that it is has been secondarily recruited in certain bacteria to tether other catalytic activities, such as the MinD to DNA. The Uma2 domain family shows a lineage specific expansion in cyanobacteria, Streptomyces
(Figure ). The proteins of this superfamily contained conserved acidic residues (data not shown), suggesting that it might also function as an uncharacterized enzyme that acts on DNA.
Figure 4 Multiple alignment of Phd/YefM. The labeling and coloring conventions are as followed in Figure . The species abbreviations are as shown in Figure , and additionally: Bjap, Bradyrhizobium japonicum; Cjej, (more ...)
Genes encoding members of the Rv3182, mlr1576, VCA0468 families of the RelE/ParE superfamily were consistently associated with conserved downstream genes that encoded small proteins (90-110 residues) unrelated to either the Phd/YefM or MetJ/Arc superfamilies (Figure ). PSI-BLAST searches initiated with these proteins showed that they all contained a conserved helix-turn-helix domain related to the lambda cro protein (cHTH domain). This suggested that they are likely to be DNA-binding proteins that act as transcription regulators of the upstream genes, which encoded members of the RelE/ParE superfamily. By analogy to the other PSK systems, these cHTH proteins are also expected to function as antitoxins countering the action of the products of their upstream genes. However, given the 'reverse' organization with respect to the classical PKS systems, it is conceivable that the functional interaction between the cHTH transcriptional regulator and the toxin component is different in these systems.
One possibility, which is supported by the specific relationship between these cHTH proteins and cro/cI repressors, is that these proteins act as repressors of the toxin gene. The degradation of the repressor under certain conditions could then allow the expression of the toxin component. The Z5902 family of the RelE/ParE superfamily, where the RelE/ParE domain is fused to a carboxy-terminal SF-I helicase module, differs from all other families in its predicted operon organization. These proteins typically co-occur with genes for another large helicase of superfamily II (SF-II), a restriction endonuclease and a DNA methylase. This implies that these proteins could constitute a novel restriction-modification complex, in which the RelE/ParE domain could function as a DNA-binding domain.
The above observations suggested that there is considerable unity in the organization of these toxin-antitoxin gene systems: typically these comprise of two small genes, in which one member of the pair encodes a toxin and the other encodes a DNA-binding protein that functions as an antitoxin and a transcription factor. However, the transcription factor and toxin in a functional comparable pair might belong to entirely unrelated superfamilies of proteins. Thus, genes of the RelE/ParE superfamily may be associated with genes for transcription factors belonging to either the MetJ/Arc or Phd/YefM or cHTH superfamilies. Likewise, a survey of the operonic associations for transcription factors showed that the Phd/YefM might be associated with at least two unrelated toxin superfamilies, namely RelE/ParE and Doc (see below). Nevertheless, this strongly coupled operon architecture in the form of a gene-dyad encoding a transcription factor and a toxin, appears to be a unique signature of PSK and related regulatory systems. Hence, to detect other potentially novel transcription factors and toxins, we systematically surveyed the gene neighborhoods of transcription factors which were close homologs of those associated with the RelE/ParE-superfamily toxins in order to find organizations similar to the PSK systems. We then transitively extended this scanning of gene neighborhoods on the homologs of any potential toxin candidates that were detected in the first screen and sought to detect any other transcription factors they may be associated with these newly predicted toxin-like genes. In particular, we concentrated on only those potential toxin or transcription factors that are conserved across a wide range of cellular genomes. Figure illustrates the network of contextual connections that were recovered in these screens in the form of a directed graph. Previously observed associations such as that of MetJ/Arc transcription factors with toxins of the MazF superfamily [25
], and Phd/YefM transcription factors with toxins of the Doc family were recovered in these screens supporting the effectiveness of this procedure.
Importantly, the screening procedure recovered a novel widespread family of small proteins (~100 residues, typified by MJ1121) that was consistently found downstream of genes for MetJ/Arc transcription factors (Figure ). In this respect they closely resembled the operons of the RelE/ParE and MazF superfamily PSK systems. Sequence profile searches initiated with MJ1121 and its relatives showed that these small proteins comprised entirely of a RNA-binding domain, which we had previously described as the PilT-N terminal (PIN) domain [33
]. Transitive analysis of the gene neighborhoods, using this class of solo PIN domain proteins as the pivot, showed that those versions which were not encoded by genes downstream of MetJ/Arc transcription factors were associated with other sets of conserved upstream or downstream genes (Figure ). Analysis of these genes showed that two groups of solo PIN-protein-encoding genes were flanked by genes for transcription factors of the Phd/YefM and AbrB superfamilies [37
], which are also found in other PSK operons as antitoxins and transcriptional regulators with other unrelated toxin genes (Figure ). For example, MazF, the archetypal member of the MazF/CcdB/KiD superfamily of toxins, is encoded by a gene that is operonic with the MazE gene, which encodes an antitoxin of the AbrB superfamily of transcription factors. Two other groups of solo PIN-encoding genes were associated with upstream genes encoding conserved proteins, typified by AF0608 from Archaeoglobus
and RV0623 from Mycobacterium tuberculosis
, respectively (Figure ). Secondary structure prediction based on multiple alignments for these gene products showed that they comprised of small globular domains (Figure ) with a conserved extended region followed by two helices. This secondary structure, together with the conservation pattern of the residues in these families, strongly suggested that they might define novel transcription factor families possessing a 'ribbon-helix-helix' fold as seen in the MetJ/Arc superfamily [28
]. Yet another group of solo PIN domain proteins, typified by AF0099, were encoded by genes associated with upstream genes which encoded predicted DNA-binding proteins containing HTH domains belonging to the Pipsqueak family [28
]. Finally, one group of solo PIN-protein-encoding genes was consistently associated with upstream genes encoding a family of small proteins that did not show detectable similarity to any known family of transcription factors. A multiple alignment of this family, with AF0319 as an archetypal member, reveals a simple α + β fold with a highly conserved amino-terminal region enriched in positively charged residues (Figure ). Based on the contextual precedence offered by the other T-A operons, we predict that AF0319 defines a novel class of transcription factors that regulate the expression of the PIN protein-encoding genes.
Figure 5 Multiple alignment of novel transcription factors associated with the PSK operons. (a) AF0608 family, (b) Rv0623 family and (c) AF0319 family. The labeling and coloring conventions are as followed in the legend to Figure . The species (more ...)
Based on this web of contextual connections offered by gene neighborhoods (Figure ) we predict that the above-detected group of solo PIN domain proteins defines a toxin-like component of novel PSK-related regulatory systems. These predicted PSK-related systems with the PIN domain are as widespread as the systems with proteins of the RelE/ParE superfamily in both archaea and bacteria.
Functional and evolutionary connections of the PIN and Doc domains and eukaryotic nonsense-mediated mRNA decay
In contrast to the RelE proteins that are restricted to prokaryotes, the PIN domain is found in all three superkingdoms of life. This suggested that the PSK-related regulatory systems with PIN domain proteins might throw light on the more general roles of such systems. Given the RNA-binding role for the PIN domain [34
], it is likely that these systems elicit their action by acting upon some RNA substrate. Importantly, a highly-conserved solo PIN domain protein is encoded by the archaeal super-operons that contain genes for ribosomal proteins and translation GTPases, like eIF3γ (Figure ). This contextual connection implies that this version of the solo PIN domain is likely to function in the translation process in association with the ribosome and eIF3γ. This observation, along with the analogy to the Doc, RelE and possibly the MazF systems, implies that the PSK-related systems with PIN domains might function as translation inhibitors. The PIN domain proteins from eukaryotes suggest a deeper functional analogy between the PIN and RelE domains. These eukaryotic PIN domain proteins, such as SMG-7 from Caenorhabditis elegans
and Nmd4p from yeast, are known to participate in the process of nonsense codon mediated decay (NMD) of mRNA [36
]. In eukaryotes, this system specifically targets mRNAs with stop codons for degradation [42
]. This suggests that the prokaryotic PSK-related systems with PIN domain proteins are likely to target transcripts in a process analogous to NMD of mRNA. There has been an earlier proposal that the PIN domain may be related to 39R59 exonucleases [36
]. However, even though these two domains may have a common fold, they show differences in the conserved residues that constitute their active sites (additional data file 1) [34
]. Hence, it possible that certain PIN domains, analogous to the RelE domains, cleave RNA only when it is associated with the ribosome. Thus, we predict that a ribosome-associated RNAse activity is likely to be the common mechanism of action for the solo PIN proteins in NMD as well as in prokaryotic PSK-related systems.
The above observations suggest that the crucial PIN domain protein of the NMD system is perhaps a remnant of an ancient PSK-type regulatory system. The emergence of the nucleus in eukaryotes, and the uncoupling of translation and transcription could have caused the PIN domain protein to be released from the tight regulatory circuit involving a coupled antitoxin transcription factor. Our earlier studies have suggested that other key components of the NMD system and the eukaryotic translation initiation systems have evolved from a common group of ancestral proteins [44
]. The evolution of interactions with this eukaryote-specific complex might have contributed to the decoupling of the solo PIN domain proteins from the ancestral PSK-related system, and led to their incorporation into the NMD system.
We examined other superfamilies of toxins to determine if they included widely distributed members with a general functional significance similar to the solo PIN domain proteins. Several PSK-systems have a very limited phyletic distribution [6
] and are not further detailed here because they are unlikely to throw light on broadly deployed regulatory mechanisms. The well-known MazF/CcdB/Kid superfamily is widely represented in the bacterial superkingdom [25
] and a single archaeal genus, Pyrococcus
, but not in eukaryotes (Figure ). As the structures of several proteins from this superfamily are currently available, we searched the PDB database [45
] with them to detect other related structures. These searches indicated that although the MazF/CcdB/Kid domain possessed a SH3-barrel fold, they were not closely related to any other members of this fold. Hence, it is likely that these domains represent a specialized version of the SH3-barrel fold that was derived in the bacteria.
The Doc toxin of the Phd-Doc PSK system has been hitherto detected only in P1-like phages and related mobile DNA elements from γ-proteobacteria [6
]. Our sequence profile searches with the PSI-BLAST program recovered several homologs of Doc from several proteobacterial lineages, low GC Gram positive bacteria, actinobacteria, cyanobacteria, spirochetes, Aquifex
, some archaeal lineages and animals, with statistically significant expect values (e < 0.001). Amongst these newly-detected homologs of Doc were proteins such as the Fic protein from E. coli
], and the huntingtin associated protein E (HYPE) [48
]. The conserved region shared by all these proteins was approximately 125 to 150 residues long, and appeared to define a novel globular domain that we refer to, hereinafter, as the Doc domain.
A multiple alignment of the Doc domain superfamily (Figure ) shows that these proteins share several nearly absolutely-conserved charged or polar residues, and the proteins are predicted to assume an α-helical fold. The amino-terminal half contains a highly-conserved histidine and a basic residue (almost always arginine), while the carboxy-terminal half contains a characteristic motif with a HX3 [DE]XNXR (where X is any amino acid) signature (Figure ). This conservation pattern suggests that the Doc domain is a catalytic domain, with the charged or polar residues constituting the catalytic residues. While this pattern of residues does not match those seen in the active sites of any known class of α-helical enzymes, the conserved histidines and asparagine could form a metal chelating site. A mutant version of the Doc protein, in which the amino-terminal-conserved histidine is disrupted, loses its toxin activity [30
]. This suggests that the catalytic activity of the Doc protein is required for its toxicity. Experimental evidence has suggested that Doc blocks a step in translation [2
]. This observation, along with the predicted enzymatic nature for the Doc domain, suggests that it might possibly act as a nuclease that blocks translation by cleaving transcripts. Alternatively, it is possible that it acts as an uncharacterized RNA-processing enzyme that modifies transcripts and makes them unusable for translation.
Figure 6 Multiple alignment of the Doc domain. The three major families of the Doc domain superfamily have been delineated by small blank spacers. The labeling and coloring conventions are as followed in the legend to Figure . The species abbreviations (more ...)
A phylogenetic analysis of the Doc superfamily reveals that it contains three distinct families (Figure ). The first family contains the Doc protein from phage P1 and its homologs from several bacterial genomes. Typically, upstream genes for an antitoxin transcription factor accompany genes encoding members of this family (Figure ). All these proteins contain a minimal stand-alone version of the Doc domain. The second family, typified by the animal HYPE protein is also found in several bacteria and some archaea. These proteins contain a longer insert after the conserved amino-terminal motifs (Figure ) and are typically multidomain proteins. The animal HYPE contains a amino-terminal tetratricopeptide repeat (TPR) module, whereas most prokaryotic versions are fused to a carboxy-terminal DNA-binding winged HTH (wHTH) domain [28
]. Interestingly, a single bacterial protein, XCC2565 from Xanthomonas
, has leucine-rich repeats (LRR, Figure ) amino-terminal to the Doc domain. The presence of TPR repeats is reminiscent of similar TPR modules that are present amino-terminal to the PIN domain in NMD proteins such as Smg-7 [36
]. The human HYPE protein interacts with the huntingtin protein, which also contains similar α-helical ARM repeats that adopt a superstructure similar to the TPR repeats [48
]. While the physiological relevance of these interactions is unclear, it is plausible that the HYPE is part of an uncharacterized multiprotein complex in the animal cells that may have a regulatory role similar to the chromosomally encoded versions of the bacterial Doc systems. Although no transcription factor genes are seen accompanying the genes for the prokaryotic HYPE orthologs, the carboxy-terminal wHTH could possibly function as an inbuilt transcriptional regulator for these proteins. A single bacterial member of the HYPE family, namely PfhB2 from Pasteurella
, contains two Doc domains fused to several fibrinogen-type repeats and a conserved domain found in several bacterial agglutinins (Figure ). This protein is likely to be an extracellular protein, and may represent an unusual case of recruitment of the Doc domain for a novel function, perhaps as a secreted nuclease or an enzyme for the processing of extracellular polysaccharides. The third family of Doc-related proteins is comprised of the E. coli
Fic protein and its orthologs from diverse bacteria (Figure ). Like the HYPE family, they also contain a longer insert in the Doc domain after the amino-terminal conserved motif (Figure ). These clearly do not appear to be parts of a PSK-related system for they do not show any conserved operon architectures. Mutations in the Fic protein result in filamentous growth, indicating a role in cell division [46
]. Based on the predicted catalytic activity for the Doc superfamily, it is possible that the Fic proteins may target specific transcripts when induced under certain growth conditions.
The above analysis suggests that there is considerable diversity amongst the T-A systems. Most widespread prokaryotic PSK or related systems appear to have been derived by mixing and matching a few major classes of toxins and antitoxins (Figure ) that appear to have independent evolutionary origins. The major classes of toxins are the RelE/ParE superfamily, the MazF/CcdB superfamily, the Doc superfamily and the solo PIN domain superfamily (Figure ). The major classes of antitoxin transcription factors are the MetJ/Arc superfamily and related ribbon-helix-helix fold proteins, the HTH superfamily, the AbrB superfamily and the Phd/YefM superfamily. This suggests that all PSK-related systems have not descended from a common ancestor, but have been assembled on different occasions from a relatively small pool of proteins. One simple hypothesis that could account for the observed pattern of gene neighborhoods is the in situ
displacement of genes for functionally related proteins in a tightly maintained operon. In this process, the operon architecture is maintained due to the strong functional interactions of the encoded polypeptides, but the actual origin of the polypeptides encoded by it is not constrained. This is likely to happen if unrelated polypetides can perform the same function equally effectively. This is consistent with the functional identity of different superfamilies of antitoxins that act as transcription factors. The potential functional equivalence of several unrelated toxins, such as RelE, the PIN domain and Doc domain toxins, or ParE and CcdB suggests that even the toxin genes are viable candidates for in situ
displacement by analogs. Thus toxin or antitoxin genes could be displaced in situ
by functionally equivalent, but unrelated genes, while the operon architecture itself is preserved. This process is highly reminiscent of the displacement of functionally equivalent, but evolutionarily unrelated genes in certain DNA recombination related operons in bacteria and phages [49
]. However, the case of the RelE/ParE superfamily suggests that toxin-antitoxin gene pairs could undergo vertical evolutionary divergence to acquire very distinct functions.
Finally, the abundant presence of PSK-related systems in prokaryotic chromosomes supports the original proposal of Gerdes and recent experimental studies that these systems could function as more generic regulatory systems [5
]. In particular, they appear to have proliferated on the chromosomes of some prokaryotes, such as the RelE system in several proteobacteria and the PIN system in archaea, Nostoc
and Mycobacterium tuberculosis
Furthermore, in some cases, domains such as Doc, PIN, RelE/ParE and YefM proteins appear to have been incorporated in systems that function outside the context of classic PSK-related systems.