|Home | About | Journals | Submit | Contact Us | Français|
CCCH type zinc finger proteins are RNA binding proteins with regulatory functions at all stages of mRNA metabolism. The best-characterized member, tritetraproline (TTP), binds to AU rich elements in 3' UTRs of unstable mRNAs, mediating their degradation. In kinetoplastids, CCCH type zinc finger proteins have been identified as being involved in the regulation of the life cycle and possibly the cell cycle. To date, no systematic listing of CCCH proteins in kinetoplastids is available.
We have identified the complete set of CCCH type zinc finger proteins in the available genomes of the kinetoplastid protozoa Trypanosoma brucei, Trypanosoma cruzi and Leishmania major. One fifths (20%) of all CCCH motifs fall into non-conventional classes and many had not been previously identified. One third of all CCCH proteins have more than one CCCH motif, suggesting multivalent RNA binding. One third have additional recognizable domains. The vast majority are unique to Kinetoplastida or to a subgroup within. Two exceptions are of interest: the putative orthologue of the mRNA nuclear export factor Mex67 and a 3'-5' exoribonuclease restricted to Leishmania species. CCCH motifs are absent from these proteins in other organisms and might be unique, novel features of the Kinetoplastida homologues. Of the others, several have a predicted, and in one case experimentally confirmed, connection to the ubiquitination pathways, for instance a HECT-type E3 ubiquitin ligase. The total number of kinetoplastid CCCH proteins is similar to the number in higher eukaryotes but lower than in yeast. A comparison of the genomic loci between the Trypanosomatidae homologues provides insight into both the evolution of the CCCH proteins as well as the CCCH motifs.
This study provides the first systematic listing of the Kinetoplastida CCCH proteins. The number of CCCH proteins with more then one CCCH motif is larger than previously estimated, due to the identification of non-conventional CCCH motifs. Experimental approaches are now necessary to examine the functions of the many unique CCCH proteins as well as the function of the putative Mex67 and the Leishmania 3'-5' exoribonuclease.
Pathogenic kinetoplastid protozoa, such as the widely studied 'Tritryps' Trypanosoma cruzi (Tc), Leishmania major (Lm) and Trypanosoma brucei (Tb), have complex biphasic life cycles and consequently require changes in gene expression in response to extrinsic and intrinsic signals. For instance, at least 5% of all Tb genes are developmentally regulated at the mRNA level between any two of the experimentally tractable life cycle stages [1-4]. Kinetoplastids regulate protein coding gene expression almost exclusively at the post-transcriptional level with the aid of RNA binding proteins (reviewed in ). One group of RNA binding proteins is defined by the presence of a CCCH type zinc finger motif that directly binds to RNA. Different CCCH proteins regulate all stages of mRNA life, amongst the best-studied are the proteins of the TIS11 family, with the best characterized being the mammalian protein tritetraproline (TTP). TIS11 proteins bind to AU-rich elements in the 3' UTRs of their target mRNAs, in most cases mediating their degradation (reviewed in ). The likely mechanism is the recruitment of mRNA degradation factors to the target mRNAs, many of which have been found to interact with TIS11 proteins [7-9]. Other CCCH proteins control the translation of their target mRNAs, for instance the C. elegans protein POS-1 [10,11]. The Drosophila CCCH protein ZC3H3 regulates mRNA adenylation and nuclear export and also binds to known nuclear export factors . Five Arabidopsis CCCH proteins have been shown to possess intrinsic endonuclease activity, including the orthologue to the polyadenylation specificity factor CPSF30 [13,14]. CCCH proteins have between 1 and 6 CCCH motifs. These were originally defined as C-X6-14-C-X4-5-C-X3-H  but recently redefined as C-X4-15-C-X4-6-C-X3-H, following the genome wide analysis of the rice and Arabidopsis CCCH proteins .
As part of a project that aimed to understand the regulation of nuclear export in trypanosomes, a putative orthologue to the yeast nuclear export factor Mex67 was identified in T. brucei. The finding of a CCCH motif in the putative Mex67 prompted us to set out to identify and compare the entire set of CCCH proteins in the genomes of the Kinetoplastida. A previous screen for the two most common CCCH motifs (C-X7-C-X5-C-X3-H and C-X8-C-X5-C-X3-H) in the Tritryp genomes identified of 50, 68, 41 CCCH proteins in Tb, Tc and Lm, respectively [17,18]. In addition, some proteins containing one of the common CCCH motifs also contained a C-X10-C-X5-C-X3-H motif [17,18]. For ease of reading here, CCCH motifs in the two most common classes C-X7-C-X5-C-X3-H and C-X8-C-X5-C-X3-H will be called 'conventional'; others, such as C-X10-C-X5-C-X3-H, 'non-conventional'. This term is used to highlight a difference and does not mean that they are less likely to be CCCH motifs .
Only three of the CCCH zinc finger proteins were identified as having readily apparent orthologues in other organisms: the splicing factor U2AF35  and two components of the mRNA cleavage and polyadenylation apparatus, CPSF30 and FIP1 [20,21]. Of the previously identified CCCH proteins unique to kinetoplastids, two families have been experimentally characterized: (i) the ZFP CCCH proteins involved in the regulation of differentiation [22-25] and (ii) the cycle sequence binding proteins (CSBPs) that bind a conserved sequence in S-phase regulated mRNAs [26-28]. The vast majority of the trypanosome CCCH proteins defined by the conventional CCCH motifs appeared to have only one CCCH finger; while nearly two thirds of the Arabidopsis and rice proteins have at least two . The binding of the CCCH protein TTP to AU rich elements is dependent on two intact CCCH motifs, one is not sufficient , and it has been speculated that in trypanosomes such multivalent RNA binding may be achieved by oligomerization, such as occurs between the CCCH proteins of the ZFP family [24,30].
Here, an extended analysis of the CCCH type zinc finger proteins in the genomes of the Tritryps is presented. The inclusion of non-conventional CCCH motifs into the search increased the fraction of CCCH proteins with more than one CCCH motif to one third and resulted in the identification of many novel CCCH proteins. One example is the putative orthologue to the nuclear export factor Mex67 that has no CCCH motifs in mammals or fungi.
The Tritryp genome database  was searched for all proteins containing the motif C-X4-15-C-X4-6-C-X3-H. This motif covers both the conventional (C-X7-C-X5-C-X3-H and C-X8-C-X5-C-X3-H) and the more recently defined non-conventional CCCH motifs. Some motifs were recognized more than once due to multiple cysteine residues; the sequence closest to a conventional motif was used for further analysis. Protein sequences derived from genes annotated as pseudogenes in the databases were removed after the initial screen (see materials and methods for details). The resulting set contained 131, 233 and 120 proteins from Tb, Tc and Lm, respectively, containing between one and five putative CCCH motifs. The larger number for Tc results from pairs of allelic variants as the genome strain (CL Brener) is a hybrid . The CCCH motifs belonged to all possible classes, but there was a clear enrichment in the two conventional CCCH motifs. Of all candidate CCCH motifs, there were 52/158 (Tb), 106/292 (Tc) and 47/148 (Lm) conventional CCCH motifs that were also recognized by one or more of SMART (Sm00356), Pfam (PF00642) or InterPro (IPR000571). These 205 CCCH motifs (the training set) were used to produce sequence logos [Additional file 1]. There was very little difference between the sequence logos derived from each of the Tritryps [Additional file 1] and little difference between the sequence logos of the two conventional classes of CCCH motifs (Figure (Figure1A1A and and1B).1B). As expected for CCCH motifs that are recognized by Pfam, the logo was similar to the Pfam sequence logo for CCCH motifs or the logos for rice and Arabidopsis CCCH proteins . A combined sequence logo was created (Figure (Figure1C)1C) and used to evaluate all non-conventional CCCH motifs and conventional CCCH motifs not recognized by Pfam, Smart or InterPro. First, a CCCH motif candidate was considered a likely CCCH motif, if it had at least 2/4 of the conserved residues coloured red in the sequence logo in Figure Figure1:1: F/Y/H/W at position C1+3, G at position C2-2, G at position C3-3 or F/Y/Q at position C3+2. This criterion was stringent as it excluded 5 of the 205 sequences in the training set as well as the cycling sequence binding protein A (CSBPA) of Lm; the Tb and Tc orthologues of CSBPA were recognized. Therefore, in a second step, CCCH finger candidates not recognized by the first criterion were also considered likely CCCH motifs, if they had three or more identities to the amino acids coloured either red or green in the sequence logo in Figure Figure1.1. This way, all 205 CCCH fingers of the training set and LmCSBPA were included. The list now contained 66 (Tb), 133 (Tc), and 78 (Lm) candidate CCCH zinc finger proteins. In a third step, proteins that were unlikely to be CCCH zinc finger proteins were removed manually: (i) cell surface proteins or proteins with a known function unrelated to nucleic acids and (ii) CCCH motif candidates with homologues in the other Tritryps that did not possess the CCCH motif. In most cases, the manually excluded proteins had non-conventional CCCH motifs that had little similarity to the sequence logo, supporting the finding that they were not real CCCH zinc fingers. The exceptions were five proteins that were classified as 'possible CCCH zinc finger proteins'. These were excluded from the final list but are discussed below. After removing allelic variants from the Tc set, the list of CCCH proteins contained 51 (Tb), 53 (Tc) and 54 (Lm) sequences [Additional file 2]. These numbers were reduced to 48 (Tb), 51 (Tc) and 54 (Lm) after CCCH proteins encoded by genes that were present in tandem arrays of the same genomic locus and had more than 40% amino acid sequence identity to each other (e.g. gene products of recently duplicated genes) were counted as one in order to limit redundancy. The complete list of genes is shown schematically in Figure Figure2.2. Together, the Tritryps have 62 non-redundant CCCH proteins. Of these, 39 are present in all three species; the remaining 23 are unique to one or two of the Tritryps (Figure (Figure3A,3A, Figure Figure2).2). Roughly one third, 16/48 (Tb), 20/51 (Tc), and 16/54 (Lm), have more than one CCCH motif (Figure (Figure3B,3B, Figure Figure2).2). Many of these have one non-conventional CCCH motif, the reason they had not been classified previously as containing multiple CCCH fingers. 20% of all CCCH motifs were non-conventional with C-X7-C-X4-C-X3-H, C-X8-C-X4-C-X3-H and the previously identified C-X10-C-X5-C-X3-H being the most abundant non-conventional CCCH motifs (Figure (Figure3C)3C) [Additional File 2]. About one third of all CCCH proteins, 14/48 (Tb), 15/51 (Tc) and 18/54 (Lm), have other recognizable domains (Figure (Figure2).2). In addition to the domains present in the experimentally characterized CCCH proteins described above, recognizable domains included other zinc finger domains (ZnF_C2H2, Zf-RanBP, ZnF_HIT), a cytidine deaminase domain, a helicase domain, a DNAJ domain, an exonuclease domain, a HECT domain and a U-box. The vast majority of CCCH proteins are unique to Kinetoplastida, or even to a subgroup of the Kinetoplastida. Most have no similarities to proteins of other eukaryotes; some have similarities that are restricted to conserved domains only. Two exceptions were a putative Mex67 orthologue and a Leishmania specific 3' exoribonuclease, discussed below. Using the identity between Tb and Lm for illustration, the most conserved protein was the splicing factor U2af35 (64%), followed by ZC3H9 (52%), ZC3H17 (49%), CPSF30 (48%), ZC3H23 (48%) and ZC3H41 (48%) (Figure (Figure22).
This investigation of CCCH proteins was initiated by an in silico search for a trypanosome homologue of the budding yeast mRNA export factor Mex67 (NXF1 and TAP in mammalian cells, reviewed in ). Using standard BLAST parameters, the protein encoded by Tb11.22.0004 gave the highest p-value (2.3e-07) and screening the S. cerevisiae proteome with Tb11.22.0004 gave a single hit, Mex67, p-value (1.5e-07) [Additional file 3:Supplemental Figure S2A]. An InterPro search for domains and motifs in Tb11.22.0004 detected the presence of PTHR10662 , characteristic of NXF1-related proteins, and a CCCH zinc finger near the N-terminus. S. cerevisiae Mex67 and mammalian NXF1 do not contain a zinc finger [Additional file 3:Supplemental Figure S2A].
Members of the NXF1 family are generally not very similar to each other, for example, the region of highest identity between the Drosophila melanogaster NXF1 and S. cerevisiae Mex67 (residues 107-598, determined by NCBI blast2seq) is 23% identical. Using the same programme, the region of highest identity between S. cerevisiae Mex67 and Tb11.22.0004 was identified as ~160 amino acids at the N-termini (Mex67 residues 96-248 and Tb11.22.0004 residues 49 to 206) and had 31% identity. The same identity was found between Drosophila NXF1 and S. cerevisiae Mex67 for the same region [Additional file 3:Supplemental Figure S2B]. The closest homologues of Tb11.22.004 in the other Tritryps, Tc00.1047053506127.20/Tc00.1047053508271.4 and LmjF27.1690 also contain a zinc finger near the N-terminus and have closest homology to the yeast mRNA nuclear export factor Mex67 and its mammalian orthologue TAP/NXF1.
The subcellular localisation of the putative Tb MEX67 was determined by fluorescent protein tagging of the endogenous locus of Tb11.02.0004. The protein localized to the nucleoplasm as well as to spots at the nuclear periphery that are similar in number and size to spots seen with an eYFP fusion of the nucleoporin NUP54/57 (Figure (Figure4A).4A). The localization of Mex67 to nuclear pores has been described for both yeast Mex67  and the mammalian homologue TAP . This observation supports, but does not prove, the identification of Tb11.02.0004 as the Mex67 orthologue.
The finding that the putative kinetoplastid MEX67 contains an RNA-binding zinc finger is novel and Mex67 homologues from other non-Opisthokonta species were investigated. Neither the Mex67 homologues from Dictyostelium discoideum nor from Entamoeba histolytica (both Amoebozoa) contained a CCCH domain. No close Mex67 homologue was readily recognisable in the available genome sequences of organisms in either the Archaeplastida (plants) or the Chromalveolata (Figure (Figure4B).4B). Furthermore, other than in kinetoplastids, Mex67 homologues were not readily recognisable in other Excavata species: Naegleria gruberi, Giardia lamblia and Trichomonas vaginalis. The role of the CCCH motif in the putative trypanosome MEX67 remains unknown, but it might indicate differences in the regulation of mRNA nuclear export between trypanosomes and other organisms.
The Leishmania-specific 3' exoribonuclease, LmjF34.1240, is similar to 3' exoribonucleases of various eukaryotes and the region of homology is not restricted to the exoribonuclease domain but extends over the entire protein [Additional file 4]. The CCCH motif, however, is unique to the Leishmania protein. Several Arabidopsis CCCH proteins possess intrinsic nuclease activity, including CPSF30 and Smic1 [13,14]. However, neither of these proteins has a clearly defined nuclease domain and the nuclease activity was dependent on one (CPSF30) or two (Smic1) of the CCCH motifs. To our knowledge, the Leishmania protein is the first protein that has both a 3' exoribonuclease domain and a CCCH motif and it would be very interesting to examine whether the CCCH motif is involved in regulating exoribonuclease activity and/or substrate specificity. The exoribonuclease is present in all Leishmania species, but absent from all trypanosome species. It is tempting so speculate that the enzyme might be involved in mRNA regulation via the cis-acting element SIDER (Short Interspersed DEgenerated Retrotransposon). SIDER elements are mainly found in 3' UTRs of Leishmania genes, where they promote mRNA degradation (SIDER2, ) or regulate translation [38,39]. In contrast, SIDER elements are 70 times less abundant in Tb and usually found in the subtelomeric regions  and do not appear to function as regulatory cis-acting elements of mRNAs.
Ubiquitination requires the ubiquitin-activating enzyme (E1), the ubiquitin conjugating enzyme (E2) and the ubiquitin protein ligase (E3). Substrate specificity is usually determined by the E3 ligase. Three different types of E3 ligases can be distinguished, named after their catalytic domains: the RING type, the (RING-related) U-box type and the HECT type. Substrates can either be monoubiquitinated, multi-ubiquitinated (monoubiquitinated on multiple lysine residues) or polyubiquitinated (carrying a chain of ubiquitins). Depending on the type and place of ubiquitination, ubiquitinated proteins can be either targeted for degradation by the 26S proteosome or the ubiquitination acts as a signal, for example to change the intracellular localization of the protein.
Among the kinetoplastid CCCH proteins are several with a predicted, and in one case experimentally confirmed, connection to ubiquitination. Both ZFP2 and ZFP3 have a motif upstream of the WW domain that has closest homology to a motif upstream of the WW domain of HECT type E3 ligases of the Nedd4 family [22,24] (Figure (Figure2).2). CSBP (ZC3H27) has two different types of ubiquitin interacting domains: UBA (ubiquitin associated domain) and CUE  (Figure (Figure2).2). UBA domains have highest affinity for polyubiquitin; CUE domains, in contrast, have been shown to bind to monoubiquitin and promote autoubiquitination (reviewed in ). In fact, a fraction of the Leishmania CSBP protein has been shown to be monoubiquitinated and a small fraction is either polyubiquitinated or multi-monoubiquitinated , consistent with the CUE domain mediating autoubiquitination. Two putative E3 ubiquitin ligases are among the Kinetoplastida CCCH finger proteins: one U-box type and one HECT-type E3 ubiquitin ligase (Figure (Figure22).
Proteins with a combination of domains associated with ubiquitination and RNA binding are common in many species ; trypanosomes are no exception. Ubiquitination may play important roles in the regulation of stability or localization of RNA binding proteins. C. elegans oogenesis and embryogenesis provide two good examples: five germline specific CCCH proteins (PIE-1, POS-1, MEX-1, MEX-5, MEX06) are degraded in somatic cells via binding of the CCCH-finger binding protein ZIF-1 to the CCCH motif and recruitment of an E3 ubiquitin ligase . Oma-1, a CCCH protein involved in asymmetric distribution of determinants in the egg is degraded during the first zygotic cell cycle via a ZIF-1-independent E3 ubiquitin ligase complex .
Four of the trypanosome CCCH proteins that have a predicted or known connection to ubiquitination are involved in the regulation of the life cycle or cell cycle. In one case, CSBP, ubiquitination of the CCCH protein has been shown. It is possible that ubiquitination is a mechanism to quickly change either stability or intracellular localization of CCCH proteins in response to life and/or cell cycle triggers.
Most CCCH proteins are present in all three Tritryps (Figure (Figure3A);3A); the average amino acid sequence identity between the closest homologues in Tb and Lm is 32% (Figure (Figure2).2). Nevertheless, there are variations in the CCCH protein content between the Tritryps (Figure (Figure2)2) and to examine the origin of these differences, the synteny between the genomic loci coding for the Tb and the Lm CCCH proteins was examined.
The CCCH proteins of Tb and Lm are distributed over 47 genomic loci (Figure (Figure5).5). 27 of these loci contain single CCCH proteins; of these, 19 have full synteny, seven have synteny with minor changes close to the locus and one has synteny to one side only (Figure (Figure5A).5A). Another 11 loci contain tandemly repeated genes in Tb and/or Lm, with at least one pair between the species (Figure 5B-E). These 11 fall into four different groups: two loci have full synteny with significantly higher similarity between the homologous pairs than between the duplicated genes within the locus, indicating that gene duplication took place before the separation of Tb and Lm (Figure (Figure5B).5B). Three loci have a duplicated gene present in only one of the species, but with significantly higher similarity between the homologues from the two species than between the duplicated genes, probably best explained by loss of one of the duplicated genes from one species after the separation of Tb and Lm (Figure (Figure5C).5C). One locus has three identical copies of a Tb gene, while the syntenic Lm locus has only one gene with 31% identity to the Tb genes, indicating that the gene duplication probably occurred after divergence (Figure (Figure5D).5D). The remaining five loci also contain differences between the two species in the number of duplicated genes, but it remains unclear, whether these originated from a gene duplication event in one of the species or from the loss of a gene in the other (Figure (Figure5E5E).
Nine loci have a CCCH protein gene in only one of the two species (Figure (Figure5F5F and and5G).5G). For eight loci, regional synteny is still intact, usually with one or two other genes out of synteny in addition to the gene coding for the CCCH proteins (Figure (Figure5F).5F). In one case, synteny was lost (Figure (Figure5G).5G). Are the differences between these loci due to the loss or the gain of the gene in one of the species? Four of the Lm genes are also present in Tc, indicating that the loss of the Tb gene is the more likely scenario, since Leishmanias separated from Trypanosomes before the separation of Tb and Tc. Four of the Lm genes are absent from both Tc and Tb, suggesting that they either arose after the separation of Leishmania from the trypanosomes or were lost after the separation of Leishmania in the common ancestor of Tc and Tb. One gene is present in Tb and Tc, but not in Lm, indicating that it was either lost in Leishmania, or gained in the common ancestor of Tc and Tb.
Taken together, the analysis confirms that the majority of the kinetoplastid CCCH proteins genes evolved prior to the separation of the species and the presence of many loci with more than one CCCH protein gene suggests that gene duplication was important in the evolution of CCCH proteins. Differences in the content of the genes coding for CCCH proteins between Tb and Lm can be accounted for by (i) the loss or gain of a single gene or (ii) gene duplication or loss of a previously duplicated gene; in both cases this occurred with only small changes in synteny.
Five proteins were classified as "possible CCCH proteins" and excluded from the final list, because the CCCH motif was absent from one or both of the other Tritryps. Alignments are shown in Figure Figure66 and give some insight into how CCCH motifs might have evolved. One Leishmania protein (LmjF34.0870) has a large insertion in the middle of its CCCH motif, while the CCCH motifs of Tb and Tc are intact (Figure (Figure6A).6A). It is unlikely that the Leishmania protein contains a zinc finger. A Tb protein (Tb927.3.1610) has a CCCH motif in the middle of a kinase domain; the motif is absent from the other Tritryps despite of the high degree of conversion in this region (Figure (Figure6B).6B). Similarly, the previously classified Tb zinc finger protein ZC3H16 (Tb927.7.250) has a conventional CCCH motif that is absent from the closest homologues in Tc and Lm (Figure (Figure6C).6C). In this case, the selective loss or gain of a CCCH motif in evolution is clearly visible: T. congolense, a close relative to Tb, has two cysteine residues of the CCCH motif, while Lm and Tc have only one. A similar case for a Leishmania protein (LmjF24.1740) is shown in Figure Figure6D.6D. In another case, a C-X7-C-X4-C-X3-H motif is present in Tb and Tc (Tb11.02.4190; Tc00.1047053506297.130; Tc00.1047053510101.200), but absent from Leishmania (LmjF11.0930). However, the Leishmania protein has very little homology to its Tb and Tc homologues in the region of the CCCH motif (Figure (Figure6E).6E). In this case it is likely that the CCCH motif was lost or gained by larger genomic rearrangements and it might be fully functional in Tb and Tc. Without experimental data it is impossible to predict whether either of the possible proteins CCCH proteins have functional CCCH type zinc fingers.
The number of non-redundant CCCH proteins in the Tritryps (48 in Tb, 51 in Tc, 54 in Lm) is similar to the number in higher eukaryotes: Arabidopsis, rice, mouse and human have 68, 67, 58 and 55 predicted CCCH proteins, respectively [16,44]. The fraction of CCCH proteins with more than one CCCH motif (34%) is about half of the fraction found among the rice (64.2%) or Arabidopsis (63.2%) CCCH proteins.
It has been previously suggested that kinetoplastids have significantly more CCCH proteins than other lower eukaryotes, based on a comparison with yeast. In the absence of published data, we attempted an estimation by screening protozoa and yeast genomes for the two conventional CCCH motifs and, as a background control, for two non-existing ones (C-X7-C-X7-C-X3-H and C-X8-C-X7-C-X3-H), without any filtering or evaluation steps. The results are summarized in Table Table1:1: kinetoplastids have slightly more CCCH motifs than any other lower eukaryotes investigated, but they are closely followed by T. gondii and D. discoideum. Other protozoa and yeast do have fewer CCCH motifs. These data are an estimation, but they do suggest that the number of CCCH proteins in trypanosomes is only slightly higher than in some protozoa, while other protozoa and the two yeasts have fewer CCCH proteins.
This study has identified the entire set of CCCH proteins in the available genomes of the Tritryps; there are 48 in Tb, 51 in Tc and 54 in Lm, excluding redundancy. The main findings are: (i) The fraction of CCCH proteins with more than one CCCH motif is larger than previously assumed; many of these proteins have one non-conventional CCCH motif. (ii) The putative Mex67 orthologue as well as a Leishmania-specific 3'exoribonuclease both have a CCCH motif that is not found in their counterparts in other eukaryotes. Many of the CCCH proteins have a predicted, or in one case experimentally confirmed, connection to ubiquitination pathways. (iii) Kinetoplastids do have only slightly more CCCH proteins than some other protozoa, although the number of CCCH proteins is higher than in yeast. (iv) The vast majority of the CCCH proteins are unique to kinetoplastids or to a subgroup within. The majority evolved before the separation of the Tritryps; gene duplication played an important role. Differences in the CCCH protein content between the Tritryps is mainly due to either the loss or gain of a single gene or gene duplication or loss of a previously duplicated gene; in all cases with little disruption of synteny.
The identification of CCCH proteins in this study relies entirely on in silico data. Some of the identified proteins might not be true CCCH zinc finger proteins, whilst others might have been missed. For instance, some putative CCCH proteins were excluded from the final list because of the absence of the CCCH motif in one or both of their closest homologue in one or both of the other Tritryps; they might, however, be true CCCH proteins. Although the majority of CCCH motifs bind RNA, a few examples of DNA binding CCCH motifs have been reported (for example [45-47]); thus, it is possible that some of the identified trypanosome CCCH proteins are not RNA-binding proteins.
Experimental approaches are now needed to verify the in silico data and to examine the function of the many uncharacterized proteins. Of particular interest is the function of a CCCH motif in the putative nuclear export factor Mex67 and in the Leishmania-specific 3'-5' exoribonuclease. Both CCCH motifs are unique features of the Kinetoplastida proteins and might reveal differences to other eukaryotes in mRNA metabolism.
Sequence logos were produced using the software of . All analyses of the Tritryp genomes were performed using the tools at either the Tritryps genome database , GeneDB http://www.genedb.org/Homepage or EBI. Sequence alignments of multiple sequences were done using ClustalW2 with default settings as provided by the server (; http://www.ebi.ac.uk/Tools/clustalw2/index.html). Pairwise alignments were performed using the EMBOSS Needle programme http://www.ebi.ac.uk/Tools/emboss/align/index.html for global alignments and the Water programme  or BLAST2seq for local alignments. Identification of protein domains was by Pfam , SMART [53,54], InterPro  or Prosite .
The Tritryp genome databases  were searched for CCCH motif containing proteins using motif search for C-X4-15-C-X4-6-C-X3-H. A sequence logo was created including only C-X7/8-C-X5-C-X3-H motifs that were recognized by at least either SMART (Sm00356), Pfam (PF00642) or InterPro (IPR000571) (the training set) and are thus very likely to be real CCCH motifs. This consensus motif was then used to arbitrarily define conditions to further filter all CCCH motifs that did not fall into the group of the training set. The stringency of the chosen conditions was tested on the training set and gradually decreased until it included all proteins in the training set. The dataset was further filtered manually to exclude proteins unlikely to contain CCCH motifs using the criteria described in the results section.
The dataset contained four genes that were annoted as pseudogenes in the genome databases, two in L. major (LmjF02.0100 and LmjF22.0130) and two in T. cruzi (Tc00.1047053506977.110 and Tc00.1047053511715.50). Both Leishmania pseudogenes have premature stop codons and longer counterparts in other Leishmania strains (see Figure Figure2).2). Confirmation of the sequence for the Leishmania major isoforms was obtained from Matt Rogers (Sanger institute). Tc00.1047053506977.110 also has a premature stop codon and a longer counterpart in T. congolense. Tc00.1047053511715.50 has an internal shift out of frame and then back in again, the CCCH motif is in the out of frame region and therefore an artefact.
The putative T. brucei Mex67 orthologue (Tb11.22.0004) and NUP54/57 (Tb927.4.5200) were expressed in Trypanosoma brucei Lister 427 procyclic cells as C-terminally tagged eYFP fusion proteins from their endogenous loci as described in . For microscopic imaging, cells were washed once in SDM79 without serum or haem, fixed at a density of 1*107 cells/ml with 2.4% paraformaldehyde overnight, washed once in PBS and stained with Hoechst H33258. Confocal images were prepared using a BioRad Radiance 2100 on a Nikon Eclipse E800 upright microscope using a 100/1.4 Oil DIC objective. Transgenic trypanosomes were generated using standard procedures .
Tb: Trypanosoma brucei; Lm: Leishmania major; Tc: Trypanosoma cruzi; Lb: Leishmania braziliensis; Li: Leishmania infantum; Tc:Trypanosoma congolense; CSBP: cycling sequence binding protein.
All experimental and in silico work was done by SK. The manuscript was prepared by SK and MC. NCK cloned and characterized the T. brucei Mex67 protein. All authors approved to the content of the manuscript.
Figure S1: equence logos of the C-X7-C-X5-C-X3-H and C-X8-C-X5-C-X3-H zinc fingers of Tb, Tc and Lm. Only CCCH motifs that were recognized by at least either Smart, Pfam or InterPro are included. Tc CCCH motifs present more than once due to allelic variants were not included. The sequence logos shown here were used to create the combined Tritryp sequence logos shown in Figure Figure11.
Table S1: List of the Tritryp CCCH proteins. List of the Tritryp CCCH proteins, including molecular weight, class of the CCCH motif and redundant genes.
Figure S2: Alignment of the putative TbMex67 with homologues of other organisms. A) Alignment of S. cerevisiae Mex67 with T. brucei Tb11.22.0004 B) Alignment of region of greatest identity between S. cerevisiae Mex67, Tb11.22.0004 and D. melanogaster NXF1.
Figure S3: Alignment of the Leishmania 3'-5' exoribonuclease (LmjF34.1240) with the homologous proteins of Mus musculus.
This work was funded by the Wellcome Trust. NCK held a Medical Research Council PhD studentship. We would like to thank Matt Rogers (Sanger institute) for help with the identification of Leishmania pseudogenes.