|Home | About | Journals | Submit | Contact Us | Français|
The TRIM family is composed of multi-domain proteins that display the Tripartite Motif (RING, B-box and Coiled-coil) that can be associated with a C-terminal domain. TRIM genes are involved in ubiquitylation and are implicated in a variety of human pathologies, from Mendelian inherited disorders to cancer, and are also involved in cellular response to viral infection.
Here we defined the entire human TRIM family and also identified the TRIM sets of other vertebrate (mouse, rat, dog, cow, chicken, tetraodon, and zebrafish) and invertebrate species (fruitfly, worm, and ciona). By means of comparative analyses we found that, after assembly of the tripartite motif in an early metazoan ancestor, few types of C-terminal domains have been associated with this module during evolution and that an important increase in TRIM number occurred in vertebrate species concomitantly with the addition of the SPRY domain. We showed that the human TRIM family is split into two groups that differ in domain structure, genomic organization and evolutionary properties. Group 1 members present a variety of C-terminal domains, are highly conserved among vertebrate species, and are represented in invertebrates. Conversely, group 2 is absent in invertebrates, is characterized by the presence of a C-terminal SPRY domain and presents unique sets of genes in each mammal examined. The generation of independent sets of group 2 genes is also evident in the other vertebrate species. Comparing the murine and human TRIM sets, we found that group 1 and 2 genes evolve at different speeds and are subject to different selective pressures.
We found that the TRIM family is composed of two groups of genes with distinct evolutionary properties. Group 2 is younger, highly dynamic, and might act as a reservoir to develop novel TRIM functions. Since some group 2 genes are implicated in innate immune response, their evolutionary features may account for species-specific battles against viral infection.
The TRIM gene family encodes proteins involved in a broad range of biological processes and characterized by the presence of the tripartite motif (hence the name TRIM), which consists of a RING domain, one or two B-box motifs and a Coiled-coil region (RBCC) [1,2]. The tripartite motif is always present at the N-terminus of the TRIM proteins. The order of the domains that compose the motif is also conserved: a RING finger domain precedes the B-box motif(s), and a Coiled-coil (CC) region invariably follows. Even if one of the domains is missing, the order of the remaining ones is maintained. Different C-terminal domains are associated with the tripartite motif in the TRIM family [1-4].
Both RING and B-boxes are cysteine-rich zinc-binding domains. The RING finger domain is present, in combination with other domains, in hundreds of proteins and is defined by a linear series of conserved cysteine and histidine residues that represent zinc coordination sites . The B-boxes are the critical determinants of the TRIM family and can be present as B-box1 and B-box2, which share a similar but distinct pattern of cysteine and histidine residues . When both B-box domains are present, type 1 always precedes type 2; when only one B-box domain is present, it is always type 2 . While the tripartite motif is restricted to this protein family, the C-terminal domains are also found in unrelated proteins. A limited choice of C-terminal domains is found in association with the tripartite motif and determined the recent classification of the TRIM proteins in subfamilies . This conserved multi-domain structure appears to behave as an integrated module, rather than a collection of separate motifs, suggesting a possible common function [1,6].
We previously classified 35 human RBCC-containing proteins as a gene-protein family and named it TRIM. We observed that these proteins have strong self-association ability, mainly mediated by their CC region, which results in the formation of large protein complexes. In most cases the TRIM proteins identify different discrete nuclear and/or cytoplasmic sub-cellular structures .
The presence of the RING domain and recent experimental evidence indicate that these proteins can act as E3 ubiquitin ligases, the proteins responsible for mediating the transfer of the ubiquitin moiety to the specific targets [7-10]. Alteration of their activity within ubiquitylation processes might be responsible for the clinical manifestation observed in human diseases caused by mutations in TRIM genes [3,6]. PML, RFP, TIF, and EFP are implicated in tumor insurgence and progression [11-14]. Other TRIM genes are involved in Mendelian inherited disorders: MID1 and MUL are altered in two developmental genetic diseases, Opitz Syndrome and Mulibrey nanism, respectively [15,16]. TRIM32 is involved in both a form of muscular dystrophy and a form of Bardet-Biedl Syndrome (BBS11), and MURF-1 is implicated in muscular atrophy [17-19]. Ro52 is the target antigen of auto-antibodies in both Sjogren syndrome and Systemic Lupus Erythematosus . Finally, TRIM5α has been identified as the major factor restricting HIV-1 during the early phase of infection in Old World monkey cells [3,21].
The TRIM family represents one of the largest classes of putative single protein RING-finger E3 ubiquitin ligases, strongly suggesting that the tripartite motif was selectively maintained to carry out a specialized basic common function within the ubiquitylation process. We used a genomic approach to complete the identification of all human TRIM genes and to study their evolutionary relationships in vertebrate and invertebrate organisms. We observed a general paradigm for the evolution of this family and propose a possible relationship between the evolution of TRIM genes and that of their function.
To search for all TRIM genes in humans, mouse, rat, cow, and dog, we screened their genomic sequences, using all known mammalian TRIM sequences as queries, with the BLAST and BLAT algorithms at the NCBI and UCSC genome browsers. We also performed a Pattern-Hit Initiated-Blast (PHI-Blast) search against both redundant and non-redundant databases using the sequence patterns that we previously defined for the two B-box domains as query . In addition, we used representative B-box1 and B-box2 sequences to perform TBLASTN genome screening aimed at identifying all the potential loci encoding for B-box-containing proteins. Each retrieved genomic sequence was compared to available EST/cDNA sequences to infer gene architecture. For those genes that lacked a transcript counterpart, we performed a careful manual examination of the genomic sequences by aligning them to the most closely related TRIM of the same or other species to define exon boundaries.
By combining these methods in several iterations, we retrieved the entire set of human TRIM genes. While most of these have been recently reported in the context of other studies, we also report some novel TRIM genes [3,4,22] (http://TRIMbase.tigem.it). Some of the genes we found are present as perfect or almost perfect multiple duplications in the pericentromeric region of chromosome 11 and it is difficult in these cases to establish whether they represent expressed genes (see also below). We also annotated the TRIM complement in mouse, rat, cow, and dog. The inventory of these sets and their comparisons are available at (http://TRIMbase.tigem.it).
The majority of the human proteins reported in http://TRIMbase.tigem.it fulfill the TRIM rule of domain order and composition (RING, B-box(es), CC, C-terminal domain(s)). During our searches, we also found genes encoding 'incomplete' TRIM proteins, i.e. lacking one of the domains present within the tripartite motif (RING, B-box, or CC). Differently from the RING and CC domains, our analysis clearly indicated that B-box domains are virtually always present within the tripartite motif in metazoans. However, there are a few exceptions in which the B-box(es) domain is associated with only one of the domains belonging to the tripartite motif: in humans 6 proteins that possess B-box(es) lack the RING domain (B-box and CC) and 2 have a very short sequence after the B-box and almost entirely lack the CC region (RING and B-box).
In the evolutionary analyses reported in this study we included the 68 genes listed in Table Table1:1: the 'orthodox' TRIM genes and the 8 'incomplete' TRIM genes or TRIM-like genes (possessing the B-box domain associated with either the CC or the RING domains). The 'incomplete' TRIM-like genes are included in Table Table11 and mentioned within the text with their non-TRIM names to remark their non full adherence to the strict definition of TRIM member; within the databases they are also annotated with a TRIM name, which is reported (Table (Table1).1). Moreover, of the chromosome 11 pericentromeric clusters we included in the analyses only representative members.
With the identification of the entire complement of the human TRIM and TRIM-like family, we confirmed and extended the domain composition features of these proteins. Within the TRIM modular structure we found that the spacing between adjacent domains is conserved. In fact, the distance between the RING domain and the first B-box, either type 1 or type 2, ranges from 35 to 55 residues; the distance between the two B-box domains ranges from 13 to 20 amino acids; and the spacing, partly occupied by the CC region, between the B-box2 and the C-terminal domain is usually 170–220 residue-long. The maintenance of the domain scaffold, order and spacing clearly indicates that the TRIM structure is a functional module.
We aligned the sequences of the RING finger domain of all human TRIM proteins to define a general TRIM-specific RING pattern (Additional file 1). Besides the cysteine and histidine residues, which coordinate the two zinc atoms, we found clear preferences for specific residues in positions that are probably required to maintain the cross-brace structure of the RING domain . The loop delimited by the second and third Cys residues has a tighter length range within the TRIM family (on average 11 residues) than within other RING-containing proteins . The second loop, bounded by Cys6 and Cys7, is frequently longer than the 48 residues of the general RING consensus . The RING domain has been associated with the ubiquitylation process and is mainly responsible for the interaction with the ubiquitin conjugating enzymes (E2) in the ubiquitylation cascade process . The different length and composition of the intervening sequences of the RING loops may underlie the binding specificity towards the different E2 enzymes.
The comparison of the B-box domains from all TRIM and TRIM-like sequences confirmed that the pattern of Cys and His is similar, although clearly distinct, in the two types of B-boxes (Additional file 1) [1,2]. The B-box1 has a short and tight consensus in which, besides the Cys and His that coordinate two atoms of zinc , only two positions show a clear preference for a limited choice of residues. B-box2 sequences are longer than type 1 and their consensus is looser. The cysteine and histidine residues at all 8 possible coordination positions are highly conserved consistent with the recently reported B-box2 structure definition that revealed the coordination of 2 zinc atoms  in contrast with previous data . Moreover, additional non-polar or hydrophobic residues are also maintained in defined positions. Twenty-two out of 60 TRIM and 8 TRIM-like proteins possess both B-boxes, with B-box1 always preceding B-box2, whilst the remaining proteins have a single type 2 B-box domain (Table (Table11).
We observed that the third component of the tripartite motif, the Coiled-coil (CC) region, follows the B-box2 in all bona fide human TRIM proteins as well as in six of the eight TRIM-like proteins. Only RNF101/TRIM48* and RNF102/TRIM52* do not possess this Coiled-coil region as they are truncated immediately after the B-box2. In all other cases, the CC region is always confined within 120 amino acids from the end of the B-box2 domain and in approximately 50% of the human TRIM proteins is bipartite (Additional file 2).
The C-terminal domains found in the TRIM and TRIM-like family members are not an exclusive property of this family but are also present in otherwise unrelated proteins [1,4]. The definition of the full complement of human TRIM and TRIM-like proteins allowed us to update the occurrence of C-terminal domains displayed by these proteins (Table (Table11).
The majority of the human TRIM and TRIM-like proteins (40 members) possess either the SPRY domain or the association of PRY and SPRY domain, also known as B30.2 or RFP-like domain. Table Table11 reports the presence of the PRY and SPRY domains that we found using the domain detection tools described in Methods and detailed in the legend but, due to the complicated and still debated relationship between PRY-SPRY and B30.2 domains, we will herein simply refer to them as SPRY [27-29]. The SPRY domain in turn can be associated with Fibronectin type III repeat (FN3)  and COS microtubule binding domain in different combinations . Five TRIM proteins display NHL and IGFLMN domains, either in association or alone [31,32]. TRIM56 C-terminal region shares sequence similarity with this domain although no clear NHL repeats are detected. Three TRIM and one TRIM-like proteins contain a PHD associated with a BROMO domain, a combination that was demonstrated to cooperate in nucleosome binding . Other domains are present in only one member of the TRIM family: the MATH domain in TRIM37; the ARF domain in TRIM23; and the EXOIII domain in TRIM19/PML [16,34,35]. Fifteen TRIM and TRIM-like proteins do not possess a defined C-terminal domain. In these cases, either their coding region is limited to the tripartite motif or the C-terminal portion is not similar to any other known domains (Table (Table11).
This comprehensive review of TRIM and TRIM-like associated C-terminal domains confirmed that a discrete number of motifs have been selected downstream of the tripartite motif in humans.
To trace back in evolution the origin of the TRIM family, we used human B-box1 and B-box2 sequences as queries to investigate the occurrence of these domains in the genomes of prokaryotic and eukaryotic representative species. We did not find any sequences similar to the B-box domains in prokaryotes. B-box sequences are present in plants with a consensus that is more similar to B-box1 than B-box2 (Fig. (Fig.1A).1A). We examined 50 B-box containing proteins from 4 plant species (A. thaliana, O. sativa, P. sativum, B. nigra): the B-box is found alone or associated with a second B-box, with the CCT (CONSTANS, CO-like, and TOC1) domain , or with both. Differently from mammals, proximal and distal plant B-boxes (we analyzed a total of 60 B-box sequences) are very similar to each other and, consistently, do not separate in distinct branches in phylogenetic analysis (data not shown). No association with RING or Coiled-coil domains was detected in all the plant proteins analyzed.
Besides plants and metazoans, we found B-box domains in some unicellular eukaryotes (unpublished observation). These protist species possess B-box domains that resemble either the plant or metazoan consensi, but the difficulty in attributing these lineages to specific clades compounds the tracing of the evolution of their B-box domain. In addition, since many of these protists are parasites of metazoans, we cannot rule out the possibility that horizontal gene transfer might have occurred .
Among the metazoans we also searched the genomes of two invertebrate species, Drosophila melanogaster and Caenorhabditis elegans, for the presence of B-box domains. Distinct proximal (B-box1) and distal (B-box2) domains are found in these species, sharing with mammals the same B-box1 pattern and a similar B-box2 consensus (Fig. (Fig.1A).1A). The B-box domains in these species associate with a RING domain and a Coiled-coil region in a tripartite motif as in mammals. The tripartite motif is therefore exclusive to metazoans, despite the fact that its constitutive elements are not.
However, these invertebrate organisms have TRIM complements that differ significantly from mammals: the fruitfly has 7 TRIM genes and the worm 18, 12 of which code only for a Tripartite motif (Fig. (Fig.1B).1B). Fruitfly and worm TRIM proteins share many of the C-terminal domains found in humans, however, their proportion varies among these species, highlighting lineage-specific expansions, e.g. SPRY in humans (Fig. (Fig.1B1B).
Given the numerical and structural complexity of TRIM genes in humans, we sought to characterize the relatedness among members of the family. The presence of different combinations of domains characterized by spaced cysteine and histidine residues rendered a global and reliable alignment of all TRIM and TRIM-like proteins along their entire length difficult. We therefore performed an initial alignment using the B-box2 and Coiled-coil portion. The unrooted phylogenetic tree generated from this alignment supports a recent expansion of the genes that contain the SPRY domain and suggests a preliminary separation of the human TRIM proteins in two main groups based on domain composition and branch topology (Fig. (Fig.2).2). Group 1, composed of 34 proteins (29 TRIM and 5 TRIM-like proteins), includes a high proportion of members with a RING-B1-B2-CC module in combination with all the C-terminal domains found in TRIM proteins. Group 2 is composed of the remaining 34 proteins (31 TRIM and 3 TRIM-like proteins), which possess only the B-box2 domain and are mostly organized as RING-B2-CC-SPRY; the 5 proteins of this group that lack the SPRY domain consist of the tripartite motif alone (Fig. (Fig.22).
The alignment of either single or combination of domains that compose the tripartite motif produces similar tree topologies (Additional file 3). This suggests that co-evolution of the domains present within the tripartite motif has occurred, i. e. this module mainly evolved as a single block, with no evidence of large rearrangements leading to domain acquisition or swapping among different TRIM family members; the only exceptions are the incomplete TRIM proteins that have lost one of the tripartite motif domain.
We investigated whether the two groups show differences at the level of their genomic organization. Considering solely the coding region, group 2 genes span on average 10.3 kbp split in 5.7 coding exons compared to the 45.4 kbp and 8.3 exons of the genes that belong to group 1, differences that are statistically significant (P < 0.01 for gene length and exon number). Besides the average values, the homogeneity of group 2 with respect to these two parameters is striking. In fact, group 2 gene lengths range from 1.4 to 27 kbp, with only 2 genes larger than 20 kbp, whereas group 1 genes are distributed within a larger range, 1.2 to 143 kbp. Homogeneity of group 2 is also observed for the distribution of the number of coding exons: about two thirds of group 2 TRIM genes are composed of 6 or 7 exons and only in one case they span over 10 exons; again the distribution for group 1 is broader, ranging from 1 to 20 exons.
Taken together, group 2 genes are smaller and less complex than group 1. Interestingly, several group 2 genes are clustered in small chromosomal regions, especially within the Major Histocompatibilty Complex region in 6p21.33 (Table (Table11 and Additional file 4) [38,39]. The high homogeneity of group 2 genes and their organization in clusters suggest a more recent origin of this group.
To investigate whether group 1 and 2 have different evolutionary dynamics, we compared the members of the human TRIM and TRIM-like set to their mouse counterparts (see http://TRIMbase.tigem.it and below for the definition of orthologous pairs). Ten out of 60 human TRIM and 8 TRIM-like genes (TRIM4, 5, 22, 43, 48*, 49, 52*, 64, 73, 74) do not have a murine ortholog, whereas TRIM31, 15, 20, and 61 are divergent at the level of entire domains compared to the mouse. Interestingly, these 14 non-conserved or divergent genes fall within group 2 and represent an important proportion of this group (41%). Conversely, all human group 1 genes have a mouse ortholog.
The degree of conservation of the remaining 54 human/mouse pairs is highly variable, ranging from 49% to more than 99% amino acid identity (peptide comparisons are available at http://TRIMbase.tigem.it). Group 2 pairs present on average 78% amino acid identity versus 89.2% of group 1 (P < 0.01). Furthermore, the majority of group 1 proteins show 90–100% amino acid identity against 50–90% of most group 2 proteins (Fig. (Fig.3A3A).
Fast-evolving genes tend to have a higher ratio of nonsynonymous substitutions per nonsynonymous site (Ka) to synonymous substitutions per synonymous site (Ks) than the slow-evolving ones. The average Ka/Ks ratio between human and rodent coding sequences is 0.18 . We analyzed the Ka/Ks ratio in human-mouse TRIM and TRIM-like orthologous pairs and found that group 1 genes present on average a ratio of 0.13 versus 0.29 of group 2 (P < 0.01). The Ka/Ks distribution is also significantly different (P = 0.01091) between the two groups: the majority of group 1 genes display values below 0.1 while most of group 2 show values above 0.1 (Fig. (Fig.3B).3B). Furthermore, only members of group 2 (TRIM38, 40, 60, 75) exceed a Ka/Ks ratio value of 0.5 (Fig. (Fig.3B3B).
Taken together, our analyses indicate that group 2 genes evolve more rapidly compared to group 1. This suggests that the two groups may be subject to different evolutionary constraints, likely underlying species-specific adaptations.
Preliminary rounds of sequence alignment and evolutionary analyses allowed us to divide the TRIM and TRIM-like family members into major classes and subsequently generate their phylogenetic trees separately using the full-length protein sequences from man, mouse, fruitfly and worm (see Methods for details). The results show that these proteins are evolutionarily organized in two groups that coincide with group 1 and group 2 (Fig. (Fig.44 and and5).5). The only discrepancy with the previous subdivision is TRIM62, which segregates with group 2 proteins in the evolutionary analysis. Within group 1, TRIM37 did not segregate with any subgroups in preliminary studies and therefore it was used as an outgroup in all phylogenetic analyses.
Group 1 is further divided into subgroups that grossly match with the domains downstream of the tripartite motif (Fig. (Fig.4).4). This analysis confirms that members of group 1 are present in both human and mouse with a strict 1:1 orthologous correspondence and are represented in invertebrates (Fig. (Fig.4).4). By means of this phylogenetic analysis we could appreciate better the mammals-invertebrates TRIM relationship and, since the worm and the fruitfly possess many of the C-terminal domains present in mammals, we could follow the late evolution of the TRIM modular structure as discussed below. Interestingly, no invertebrate sequences are found to be homologous to members of group 1 subgroup E, which includes TRIM and TRIM-like proteins with B1, B2, and SPRY in various combinations (Fig. (Fig.4E4E and Table Table1).1). Group 2 proteins are not represented in invertebrate species as well, and have either a complete or a truncated RING-B2-CC-SPRY domain composition (Fig. (Fig.55 and Table Table1).1). The major structural difference between subgroup E and group 2 is the presence of the B-box1, which indicates that group 2 proteins could have derived from a subgroup E member upon loss of B-box1.
Interestingly, the evolutionary analysis of group 2 proteins showed, in addition to 24 pairs of orthologs, the presence of species-specific TRIM and TRIM-like proteins, not only in humans (see above) but also in mouse (Fig. (Fig.5),5), indicating high dynamicity in the evolution of this group compared to group 1.
The analysis of the TRIM and TRIM-like complements of rat, cow, and dog showed that these species have the same set of group 1 genes as humans and mouse. Conversely, the sets of group 2 genes are different and specific to each species, including closely related ones such as mouse and rat, with only 17 genes (50–70% of group 2 genes) shared among all species (http://TRIMbase.tigem.it). These differences are due to events of gene duplication, deletion, or degeneration that likely occurred during the evolution of each single lineage. Remarkably, in three cases the same gene has undergone degeneration or deletion independently in different lineages: human TRIM38 orthologs, found in mouse and cow, are pseudogenes in rat and dog; human TRIM60 has orthologous loci in all the examined mammals but has become pseudogenes in rat, cow, and dog; human TRIM61 orthologs are found to be pseudogenes or absent in all organisms but mouse. Furthermore, two TRIM genes are present in the five species in three different states: TRIM15 is intact in human and cow, presents premature stop codons in mouse and rat, and is a pseudogene in dog; similarly, TRIM31 is intact in mouse and rat, has distal frameshifts in humans and cow, and is absent in dog. Both premature stop codons and frameshifts result in truncated proteins that have lost the SPRY domain. These losses may represent a degenerative step towards the complete inactivation of these genes.
Of note, massive gene duplications and rearrangements had occurred at the level of several genes of group 2 that cluster in the same chromosomal location (Table (Table1).1). A thorough characterization of the TRIM-rich 6p21.33 locus is reported in human and chicken [39,41,42]. Indeed, in addition to the human genes here presented, extra-copies of group 2 genes TRIM43, 48*, 49, and 64 are clustered at 2q11.1, 11q11.1, and 11q14.3 for a total of 11 predicted genes and 14 pseudogenes. These clustered loci have paralogs, but not orthologs, in some of the other examined mammalian species (http://TRIMbase.tigem.it).
A further example of recent gene evolution is the genomic cluster at 11p15.4 containing TRIM5, a gene involved in HIV-1 viral restriction in some primates . This cluster includes TRIM5 and 22, which are currently considered primate-specific, in addition to TRIM6 and 34, which are not [43,44]. By means of comparative and evolutionary analyses (Additional file 5) we found that the entire cluster, including TRIM5 and 22, was present in the last common ancestor of humans, cow, and dog, supporting a common origin for TRIM5 and its cow functional ortholog LOC516599 rather than evolutionary convergence as previously proposed .
Given the situation in mammals, we asked whether the TRIM complements evolved likewise in other vertebrate species. We searched the databases for TRIM and TRIM-like sequences in representative aves and fish species, chick (Gallus gallus) and a pufferfish (Tetraodon nigroviridis), respectively. In addition, we included in our analysis the urochordata Ciona intestinalis, a representative of the early chordate lineage from which the vertebrates originated. The searches were performed combining different iterations of PHI-BLAST (using the B-box2 pattern) and TBLASTN and BLASTP against nr protein and nucleotide databases at NCBI, starting from both human TRIM proteins and TRIM sequences found in the above species. We found 10 TRIM sequences in ciona, 37 in chick and 58 in pufferfish; all the sequences we retrieved were present in the databases as assessed or predicted genes and only some of them (especially in chicken) were already annotated as TRIM genes (the sequences retrieved for these species are reported at http://TRIMbase.tigem.it). These novel TRIM complements should not necessarily be regarded as complete since in these species there are still regions not yet sequenced or unequivocally assembled. We compared these additional vertebrate sequences to the human sequences using multiple protein alignments and phylogenetic tree constructions using as a paradigm the subdivision into groups and subgroups of Figure Figure44 and and5.5. Detailed analyses of these novel genes, at the level of transcript, protein and genomic locus, are required and will be addressed elsewhere, therefore the phylogenetic analyses we present show relationships and are not intended to represent precise evolutionary distances.
In all species analyzed, we found clear orthologs for TRIM23 and TRIM37 (Fig. (Fig.6A).6A). With respect to subgroup C, there is one representative of this subgroup in ciona, closely related to the fruitfly member. All the known mammalian members of this subgroup are found in chick with recognizable orthologous relationships (Fig. (Fig.6B).6B). This is not completely true for tetraodon in which 3 members belonging to this group have been found: one orthologous to TRIM33 and the others representing fish-specific duplications related to the TRIM33-24 clade. Genes strictly related to KIAA0298/TRIM66* and TRIM19 have not been found in tetraodon (Fig. (Fig.6B).6B). In ciona there are 5 representative members of subgroup A, 3 representing the ancestors of different subclades (TRIM9-67; TRIM1-18; TRIM54-55-63) and 2 apparently more ancestral genes. DIBP/TRIM44* appears to be uniquely represented in mammals. All the other members are present in chick and fish although orthology is clear only for TRIM9, 67, 1, 18, and 36. TRIM46, if truly absent, may have been lost in chick while of the MURF group (TRIM54, 55, 63) 2 orthologues are present in chick while 4 members with no direct orthologous relationship are present in fish (Fig. (Fig.6C).6C). An analogous situation is observed for subgroup D. In fact, ciona has two representative members; orthologous relationship with human is observed for both chick and fish for four members (TRIM2, 3, 45, 71). TRIM32 might have been lost in chick and TRIM56 in both chick and fish (Fig. (Fig.6D).6D). For the above group 1 subgroups, we therefore found that most of the mammalian components are present in chick often with clearly recognizable orthologous relationship. Although the number of members within these subgroups is similar also in tetraodon, the TRIM and TRIM-like genes in this species, consistent with a larger evolutionary distance, have duplicated and diverged more extensively, sometimes obscuring orthologies.
A different situation is observed with the group 1 subgroup E and, as expected, with group 2. As for the other invertebrates, ciona genes are not represented in this subgroup and in group 2. Orthologous relationship among mammals, chick and fish is observed only for two genes of subgroup E (ATDC/TRIM29* and TRIM65). The other genes within the subgroup are more conserved in chicken while in tetraodon many independent duplication events occurred (Fig. (Fig.6E).6E). Even more extensive duplication events and independent divergences are observed for group 2 genes. In this case, close homology among the three species is only recognizable for TRIM35 and 62. A couple of other mammalian TRIM genes are conserved in chick (TRIM7 and 41) but the remaining genes in the three species have been subjected to independent duplications and evolution (Fig. (Fig.6F6F).
To confirm in fish the presence of so many TRIM sequences poorly related to the human TRIM genes, we search TRIM and TRIM-like genes also in zebrafish (Danio rerio) using the same criteria and methods described for the chick and tetraodon. Differently from tetraodon, zebrafish presents an elevated number of TRIM and TRIM-like genes; we found 240 entries corresponding to independent genes (the list of zebrafish genes is reported at http://TRIMbase.tigem.it). Also in this case, the number of genes encoding for group 1 TRIM (excluding subgroup E) is comparable to the number in mammals, chick and pufferfish (1 subgroup B; 5 subgroup C; 16 subgroup A; 12 subgroup D) although in many cases clear duplication events occurred. However, the great expansion of the TRIM genes in the zebrafish is associated with members belonging to the group 1 subgroup E and group 2 genes (data not shown).
Analyses of TRIM complements in aves and fish corroborate the high conservation of group 1 genes during evolution and highlight the generation of unique sets of group 2 genes in each vertebrate species analyzed. Moreover, the data in non-mammalian vertebrates, especially in tetraodon, confirm that members of the group 1 subgroup E very likely gave rise to group 2 genes.
In conclusion, our study indicates the presence of two distinct TRIM gene groups. Group 1 is evolutionary more ancient than group 2 and is likely to contain basic functions that are essential to both vertebrate and invertebrate species. On the other hand, group 2 is younger and more dynamic, possibly acting as a sort of TRIM genes "reservoir" to develop novel species-specific functions.
Here, we report the identification and genomic characterization of the full complement of the human TRIM family examined from an evolutionary perspective by comparison with several vertebrate and invertebrate species.
We definitively assessed that the B-box domain is only present within the tripartite module in metazoans, with the few exceptions mentioned above and discussed below, and is therefore the defining domain of the TRIM family. We redefined the B-box1 and B-box2 consensi as well as the TRIM specific RING finger pattern using all human sequences. Within these domains we found conservation not only of the residues putatively involved in metal coordination, but also of other amino acids that compose the novel consensi. It will be interesting to model the RING, B-box1 and B-box2 sequences of the TRIM proteins on these structures to study the possible role of the conserved residues in relation to the ubiquitylation cascade [7-9,24,25].
Based on our analyses, we propose a general model of TRIM structure evolution (Fig. (Fig.7).7). Our studies suggest that the origin of the B-box domain is quite ancient and probably dates back to a common ancestor of plants and metazoans. The plants maintained either one or two B-boxes without apparent sequence differentiation into proximal and distal. Conversely, metazoans differentiated a pair of B-boxes into a proximal and a distal type. In an early step of metazoan evolution, a RING domain and a Coiled-coil region associated with the B-box(es) to generate a solid tripartite module that has been maintained from invertebrates to mammals. From then onward, the tripartite motif has evolved as a unique block and it is frequently encoded by a single exon. Interestingly, we found different mammalian and invertebrate B-box2 patterns. This may underlie functional coupling with specific interactors in each lineage, which would have forced convergence at specific sites. A similar species-specific evolutionary convergence was recently described for sulfatase enzymes and their common post-translational modification factor .
Before invertebrate-vertebrate lineage split, the tripartite module has been associated with a discrete number of C-terminal domains. Addition and loss of C-terminal domains and structure remodeling have then occurred in the various evolutionary lineages (Fig. (Fig.7).7). In at least one case the same domain acquisition has occurred independently in primates and arthropods. In fact, in some species of New World monkeys (Aoutus genus), Cyclophilin A (CypA) is fused with the tripartite motif of TRIM5 . TRIM5 confers a potent block to HIV-1 infection in Old World primates, while Cyclophilin A (CypA) enhances infection by direct interaction with the HIV capsid [21,47]. HIV-1 blockage in Aoutus cells was explained by the exclusive presence of the TRIM5-CypA chimeric gene [46,48,49]. Interestingly, we found that a tripartite motif is associated with a Cyclophilin domain also in fruitfly CG5071 indicating evolutionary convergence. Independent events of gene fusion are considered a hallmark of functional coupling that can be also present in those species in which a similar gene fusion is not observed . Similar processes of fusion between functionally associated domains may have been one of the mechanisms underlying the selection of C-terminal domains during TRIM evolution.
The early association of the B-box modules with the RING finger, a domain linked to the ubiquitylation process, has eventually brought the proteins possessing a tripartite motif to exert a common basic biochemical function, i.e. ubiquitin ligase . The large number of proteins belonging to this family in mammals highlights the success of this module to undertake its task. Since the TRIM family represents one of the largest RING finger classes, it is tempting to speculate that, among the myriads of cellular E3 substrates, a large proportion is demanding the unique tripartite structure for reasons yet to be discovered. Whereas the tripartite motif may provide the catalytic E3 activity and the ability to form the scaffold of the TRIM-defined sub-cellular compartments , the C-terminal region may contribute to select the specific substrate and/or direct the tagged substrate towards downstream pathways. The PHD-BROMO domain, for example, determines the association with chromatin, and the TRIM and TRIM-like proteins containing the PHD-BROMO domain are consistently involved in chromatin remodeling [51,52]. Along the same way, MID1/TRIM18 and the related TRIM proteins that possess a COS microtubule-binding domain exert their role on the cytoskeleton .
It should be reaffirmed that not all the proteins we included in our study retain an entire tripartite motif. As mentioned in the 'Results section', we decided to include all the genes encoding for proteins with B-box motifs and that in human correspond to the 'complete' TRIM proteins (with RING, B-box and CC) and few 'incomplete' TRIM proteins (or TRIM-like) presenting only two of the three tripartite motif composing domains (with either 'RING and B-box' or 'B-box and CC'). This might pose a formal problem on what should be classified as a TRIM protein. In the classic definition, a protein family is composed of proteins that have a common phylogenetic origin and share a degree of amino acid identity/similarity above an established threshold. In the case of the TRIM family, the initial definition of family was based on the observation that most members of this protein family share the tripartite arrangement at their N-terminal portion . What was not clear was whether the TRIM proteins had a common origin or rather they were the result of domain swapping from evolutionarily unrelated proteins. Our analyses demonstrate that all the proteins with a RING-B-box-CC module actually have a common evolutionary origin. Therefore, what was raised as an 'operative' definition is now demonstrated to be perfectly adherent to the classic definition of a gene/protein family. Our data allow the same conclusion to be drawn for the 'incomplete' TRIM proteins which possess the B-box motif and which we found evolutionarily belonging to the TRIM family. In support of that, some of the 'incomplete' TRIM proteins also present C-terminal domains characteristic of the 'complete' TRIM proteins. This parallel cannot be used for other domains present within the tripartite motif, e.g. the RING domain has been 'used' to build many different protein families in association with several TRIM unrelated domains . On the other hand, we think that a strict definition of TRIM family based only on function is not feasible at present. As discussed above, the presence of the RING domain suggests a role as E3 ubiquitin ligases for the TRIM proteins. Experimentally, this has been proven for some TRIM proteins and we cannot exclude that some of them, although containing the RING domain in the proper tripartite motif, might have a different biochemical role. What is the role of the 6 RING-less proteins we included in our study? They may be involved in ubiquitylation as well by, for example, acting as regulators of orthodox TRIM proteins through hetero-interaction. Given that the recent solution of the B-box1 and B-box2 domains revealed a strong structural similarity with the RING domain [24,25], it is tempting to speculate that these domains may interact with components of the ubiquitylation machinery and attribute these RING-less proteins the role of E3 ubiquitin ligases. Coherently with these observations, we propose to include within the TRIM family all the proteins that are phylogenetically related to established TRIM members and that have a tripartite motif at their N-terminus, including the few examples in which part of this motif has been lost.
The relatively small number of TRIM genes in lower eukaryotes compared to mammals suggests rapid and recent changes of the TRIM family. Our study revealed the presence of two main groups of mammalian TRIM genes that show distinct evolutionary features and that we named group 1 and group 2. Group 1, that is in turn subdivided in several subgroups, is composed of genes that are present in human, mouse, rat, dog, and cow with a one to one relationship. Although orthology with mammals is not always recognizable, this group of genes is highly conserved also in other vertebrates (chick and fish) in number and structure. Our data on the Ka/Ks ratio of human and mouse group 1 genes suggest that they are subject to purifying selection aimed at conserving their function. It is conceivable that group 1 consists of diversified and essential TRIM and TRIM-like functions for which little or no redundancy is present. Consistently, many group 1 genes are involved in basic cellular processes, such as cell cycle progression and transcriptional regulation, and result, when mutated, in developmental disorders, muscular phenotypes, cancer insurgence, etc. [2,6]. Some group 1 TRIM genes have also been found to be involved in viral response, namely TRIM1 , TRIM19/PML [3,54,55] and TRIM32 . TRIM19/PML, besides its involvement in acute promyelocytic leukemia, has been shown to interfere with the replicative cycle of many DNA and RNA viruses and evidence indicate that it may represent a broad-spectrum cellular defence factor .
The important increase of TRIM number in vertebrates is primarily due to the buildup of the genes that constitute group 2. Group 2 is in fact evolutionarily more recent than group 1, is not represented in invertebrates, and evolves at a faster rate. Interestingly, many TRIM proteins that belong to group 2 have been recently associated with cellular innate immunity towards viral infection. In addition to TRIM5α, other members are being investigated as potential retrovirus restriction factors. Among them, TRIM21, 22 and 34 are regulated by interferons, a family of secreted proteins that exert antiviral and immunomodulatory activities . Moreover, other group 2 genes, PYRIN/TRIM20* and TRIM21 are involved in immuno-related diseases [20,58,59]. Interestingly, group 2 proteins share the same C-terminal motif, the SPRY domain. In the case of TRIM5α, this domain is responsible for the species-specific HIV-1 restriction and is subject to positive selection in primates, underlying its possible role in directing and specifying capsid recognition [44,60]. Of note, the SPRY domain is also present in SOCS proteins, involved in cytokine signaling and innate immunity, and in the BTN family of lymphoid expressed proteins, possibly involved in immune regulation [28,61,62]. It has been proposed that the sharing of the SPRY domain between the TRIM and BTN family members located within the MHC locus is somewhat linked to their immunological function . The SPRY domain might therefore confer to group 2 proteins the ability to specifically recognize viral capsids and interfere with early steps of viral infection. Differently from anti-viral group 2 proteins, group 1 TRIM19/PML interferes with general mechanisms of viral replication common to various viruses and consistently is not subject to positive selection .
Our comparative analysis in five mammalian species shows that subsets of group 2 TRIM and TRIM-like genes are different and specific in each examined lineage. This is more evident in the three non-mammalian vertebrates analyzed, where large numbers of newly identified group 2 genes mainly lie on species-specific evolutionary clades. This observation might underlie dispensability/redundancy of some group 2 genes, which could have provided the basis for novel species-specific roles during evolution. The presence of clusters composed of massively duplicated group 2 genes, in mammals but also in chick, suggests that they may be hot-spots for TRIM gene production and remodeling. In the case of the teleost fish species some of the duplications may be the remnants of the whole genome duplication event early in the teleost lineage. Global duplication is not however enough to explain the large and independent expansion of subgroup E and group 2 genes in the teleosts and a different cause must underlie these expansions. It is interesting to note that similar to the Group 2 TRIM and TRIM-like genes, other families of genes involved in innate immune response and in particular the components that interact with pathogens have been subject to similar large lineage specific expansions in the teleost fish . Moreover, since we observed that group 2 genes tend to have a human/mouse Ka/Ks ratio higher than group 1 genes, it is tempting to speculate that some group 2 genes other than TRIM5α may be subject in some species to positive selection at specific sites to counteract species-specific battles against viral infections, as it has been shown for other family of genes involved in innate cellular immunity [64-66].
We found that the TRIM domain structure is an innovation of metazoans. The growing evidence for a common biochemical function of the TRIM proteins as ubiquitin ligases justifies the maintenance of their basic modular structure throughout evolution. Our studies indicate the presence of two distinct TRIM gene groups. Group 1 is evolutionary more ancient than group 2 and is likely to contain basic functions that are essential to both vertebrate and invertebrate species. On the other hand, group 2 is younger and more dynamic, possibly acting as a sort of TRIM genes "reservoir" to develop novel functions. Since some of the TRIM genes that belong to this group are implicated in innate immune response, we propose that the different selection we observed for this group of genes underlies pressure towards rapid changes necessary to counteract species-specific battles against viral infection.
Known mammalian TRIM and TRIM-like gene/protein sequences were retrieved from the National Center for Biotechnology Information (NCBI) and re-defined by searching against their respective genome assemblies using BLAT at the UCSC genome browser (http://genome.ucsc.edu). All the 'corrected' sequences were then used as queries to search potential novel TRIM genes within the human, mouse, rat, cow, and dog genomes, using BLAT at UCSC and TBLASTN at NCBI  genome browsers, respectively. All searches have been performed in several iterations using default parameters. Human nr and EST databases were screened using the PHI-BLAST in several iterations using the patterns previously defined for B-box1 and B-box2 . We also searched the human, mouse, rat, cow, and dog proteomes using the B-box2 as a bait in five iterations of the PHI-BLAST program , which provides a highly sensitive analysis, taking advantage of the fact that the B-box2 is a peculiar constituent of TRIM proteins and does not produce a large amount of background noise in this analysis. The retrieved amino acid sequences were subsequently used to search the respective genomes for identifying their encoding loci. Representative B-box2 sequences were also used as queries for TBLASTN search of the five mammalian genomes to identify all the potential loci encoding TRIM and TRIM-like proteins. All the retrieved genomic sequences were aligned to the available cDNA/EST sequences to infer the gene architectures. For genes that lacked a transcript counterpart in public databases, we performed a careful manual examination of the genomic sequences BLAST-comparing them to the putative more closely related ortholog or paralog, looking for splicing donor and acceptor signals to define the exon-intron boundaries. Constructed open reading frames (ORFs) were conceptually translated into amino acid sequences and checked against their closest homologs. The original genome sequencing traces (Traces-WGS), which are available at the NCBI web site, were checked when the constructed coding sequences presented either stop codons or ORF frame-shifts. When a difference between WGS traces and the genomic assembly was evident, the constructed sequence was properly corrected. Comparison of human, mouse, rat, cow, and dog orthologous TRIM and TRIM-like genes showed that >99% of splicing acceptor and donor sites were conserved at the same relative position in the coding sequence in all species, i.e. the gene structure of TRIM genes is identical among different mammals. To retrieve TRIM genes form ciona (Ciona intestinalis), chick (Gallus gallus), pufferfish (Tetraodon nigroviridis) and zebrafish (Danio rerio) we used a combination of PHI-BLAST and TBLASTN against nr protein and nucleotide databases at NCBI, respectively. The following genome releases have been used for this work: Homo sapiens, May 2004 assembly (NCBI Build 35); Mus musculus, February 2006 assembly (NCBI Build 36); Rattus norvegicus, June 2003 assembly (Baylor College of Medicine HGSC v.3.1); Bos taurus, March 2005 assembly (Baylor College of Medicine Btau_2.0); Canis familiaris, May 2005 assembly (Broad Intitute CanFam2.0). Drosophila melanogaster TRIM genes: CG1624, CG5206, CG12218, CG8419, CG5071, CG10719, CG31721. Caenorabditis elegans TRIM genes: arc1, B0281, ZK1240.1, F43C11.8, ZK1240.2, F43C11.7, ZK1240.9, ZK1240.3, ZK1240.8, ZK1240.6, C28G1.6, K09F6.7, lin41, C39F7, nhl-2, nhl-3, ncl-1, F47G9.
To identify and analyze the domain composition of the TRIM and TRIM-like protein products we used the major alternative splicing isoforms, if more than one was available, and utilized different domain prediction programs. First, we submitted the TRIM amino acid sequences to the SMART tool  where we analyzed the sequence against contemporary the SMART and Pfam  domains databases. The denotation of the C-terminal domains found within the TRIM sequences are the following: MATH, SM00061; PHD, SM00249; BROMO, SM00297; IGFLMN, SM00557; EXOIII, SM00479; FN3, SM00060; PRY, SM00589; SPRY, SM00449; ARF, SM00177; NHL, PF01436 (Pfam). The tripartite motif domains were additionally analyzed as described below. Besides the SMART results, the RING and B-boxes domains were also defined in each TRIM and TRIM-like protein by hand using the previously published patterns . In order to obtain a new profiling, the sequences corresponding to each domain were then aligned using the PRATT 2.1 program  and the best scoring consensi were selected and integrated by hand. The order of the sequences in the alignment shown reflects their degree of sequence conservation. The region of each TRIM and TRIM-like protein immediately after the last Cys or His of the B-box2 domain was analyzed for Coiled-coil prediction with the Coil 2.2 program . Analysis was performed with MTIDK and MTK matrices and both the weighted option, which takes into account the polarity of the residue within the predicted Coiled-coil heptad repeat, as well as the unweighted option. When differences of around 20–30% in Coiled-coil prediction were observed between the different methods utilized, the prediction was considered bad and not indicated in the list of Additional file 2. Moreover, only percentages of prediction higher than 50% were considered using two residue windows, 21 and 28 amino acids.
Plant B-box containing proteins (from A. thaliana, O. sativa, P. sativum, B. nigra) were retrieved from the SMART B-box database .
To perform phylogenetic analysis, TRIM and TRIM-like protein sequences were aligned using MultAlin  in a multi-step process. Only proteins containing the complete module R-B1-B2-CC were aligned in a first step, eliminating from each sequence the portion downstream of the coiled-coil domain and all segments that caused a gap to interrupt the alignment. A first phylogenetic tree was produced starting from this multi-alignment. In a successive step, each of the remaining protein sequences was singularly added to the multi-alignment, edited for exceeding amino acids, and assigned to a TRIM subgroup after inspection of the topology of the resulting phylogenetic tree. Once all TRIM and TRIM-like protein sequences were assigned to a subgroup, phylogenetic analyses were performed independently for each subgroup. TRIM37 was not included in any subgroup and therefore was used as an outgroup in all final analyses. Nucleotide sequences of TRIM5/6/22/34 and related non-human sequences were also aligned using MultAlin , but in this case a gap-removal step was not necessary due to the high similarity among all considered sequences. Neighbor-Joining and bootstrap analyses were performed with Phylo_win , computing the distances among sequences with all the methods available in the package (protein analysis: observed divergence with and without Poisson correction; PAM distance. DNA analysis: observed divergence; Jukes and Cantor distance; Kimura distance; Tajima and Nei distance; HKY distance; Galtier and Gouy distance; and LogDet distance)  (and references therein). Gap-removal was set as pairwise rather than global to minimize information loss. Bootstrap values were computed over 1000 repetitions. All tree topologies resulted to coincide in the different methods for branches with a bootstrap value >50. Evaluation of Ka/Ks values for pairs of human-mouse TRIM- and TRIM-like-coding sequences was performed at the Norwegian bioinformatics platform .
Comparison of the two groups quantitative parameters (gene lengths, exon number, amino acid identity and Ka/Ks ratios) were analyzed using the two samples t-test (two-tail test assuming unequal variances) by the Microsoft Excel statistical package. The comparison of the two groups Ka/Ks distribution has been analyzed using a two-sample Kolmogorov-Smirnov test.
MS carried out the evolutionary studies and contributed to the identification of TRIM genes in different species; he contributed to the design of the experiments and to the writing of the manuscript. SC identified the entire set of TRIM genes in human and performed the single domains alignments. BF collected all the data in the database and performed statistical analyses. AB contributed to the interpretation of the data and to the drafting the manuscript. GM conceived, designed and coordinated the study and wrote the manuscript.
The complete sets of human, mouse, rat, dog and cow TRIM and TRIM-like genes and pseudogenes as well as their sequence comparisons are available at http://TRIMbase.tigem.it. At the same site are the TRIM related sequences from ciona, chick, tetraodon, and a list with the accession numbers of the zebrafish TRIM-like genes. See Additional files 1 to 5.
Includes the alignments of the RING, B-box1 and B-box2 domains of all the human TRIM and TRIM-like proteins, alignments from which the consensi for these domains have been generated.
Reports the values of Coiled-coil predictions for all the human TRIM and TRIM-like proteins.
Shows the unrooted phylogenetic trees generated from the alignments of single domains of the tripartite motif.
Shows a schematic representation of the human TRIM genomic clusters.
Shows comparative and evolutionary analyses of the cluster of TRIM5, 6, 22, and 34 in mammals.
We thank Mario Traditi and Angelo Raggioli for informatic assistance and Luciana Esposito and Adriana Zagari for helpful discussion; we are grateful to Graciana Diez-Roux, Anna Savoia, Elena I. Rugarli, and Henrik Kaessmann for critical reading of the manuscript. This work was supported by the 'Italian Telethon Foundation' (TGMP4.2 to GM).