UBC and UEV proteins in C. elegans
On the basis of the C. elegans
genome sequence we have identified 20 ubiquitin-conjugating enzymes. Gene and protein names, prefaced by ubc-
or UBC-, respectively, have been assigned to each (Table ). Two C. elegans ubc
genes were previously arbitrarily named ubc-1
and let-70 (ubc-2)
]. These gene names do not correspond to the names of the orthologous yeast genes. C. elegans let-70 (ubc-2)
is the ortholog of S. cerevisiae UBC4/5
whereas C. elegans ubc-1
more closely resembles yeast UBC2.
To avoid further confusion, we did not use numbers 4 or 5 in the C. elegans ubc
nomenclature (10 and 11 also were not used as there were no clear C. elegans
orthologs of the corresponding S. cerevisiae
genes). Wherever possible, the assigned names do correspond to the numbering system used for yeast orthologs; however, given the disparity in gene numbers between the two species, it proved impracticable to establish a clear correspondence between all members of the family (see below).
RNAi with C. elegans ubiquitin-conjugating enzyme genes
The identification of a coding sequence as a member of the UBC family was based on two criteria: the presence of the UBC protein motif (UBCc in SMART, or Uq con in Pfam; see Materials and methods) and, within this motif, the presence of an active-site cysteinyl residue. Fourteen of the 20 C. elegans UBCs have corresponding cDNA clones. These clones not only demonstrate that these genes are expressed, but also allow confirmation of the Caenorhabditis database (AceDB) gene predictions.
Three predicted genes failed to match their corresponding cDNA sequences: D1022.1 and R01H2.6 had incorrectly predicted amino termini, and Y54G2A.23 proved to be a fusion of two separate genes, of which only the second part corresponded to a ubc cDNA. The R09B3.4 gene prediction in AceDB consists of a fusion of a UBC protein with a transthyretin. This prediction agrees with one cDNA, although several other cDNAs encode only the UBC portion. We have therefore used only the UBC portion in this analysis. Given that the algorithms used for gene prediction are not infallible, ubc genes that are not confirmed by cDNA sequences should be considered as tentative until backed by experimental evidence.
Sequence alignment of all the putative C. elegans UBC proteins (see below) permitted some further refinement of the gene predictions not supported by cDNA sequence. C06E2.3 appeared to have an incorrectly predicted intron, and an alternate splice site nearby eliminated a block of 15 amino acids that did not align with the other sequences. Similar intron boundary changes were made in F49E12.4 and Y94H6A.6, eliminating groups of 6 and 22 unaligned amino acids, respectively. C06E2.3 also appeared to have an incorrectly predicted amino terminus, by comparison with the amino terminus of F40G9.6, to which it is closely related. Accordingly, minor changes were made in these predicted genes before the protein sequences were used in the subsequent analysis. The AceDB gene prediction F52C6.12 is very similar in sequence to let-70 (ubc-2) but lacks most of the UBC motif and the active-site cysteinyl residue. This gene was not considered to encode a functional UBC.
In addition to the 20 ubc
genes that meet the criteria described above, the C. elegans
genome also contains three genes that possess the UBCc motif, but lack the active-site cysteinyl residue (Table ). These genes have been named ubiquitin E2 variants, abbreviated as uev
]. The UEV-1 sequence was confirmed by existing cDNAs, and cDNA sequences encoding UEV-3 were obtained by reverse transcription PCR (RT-PCR). We also detected an aberrant splice variant of the latter gene that had a different carboxy-terminal sequence (data not shown). The uev-2
gene has no corresponding cDNA sequences.
Recent studies using gene microarrays [14
] enabled us to ask whether C. elegans
UBC and UEV genes are expressed under normal growth conditions. The array studies show that some predicted ubc
genes do not produce detectable levels of mRNA during normal development. In general, these genes are the same ones for which there are currently no known cDNA sequences. Interestingly, four C. elegans ubc
genes (B0403.2, C06E2.3, C06E2.7, C28G1.1) that are clustered on the X chromosome have yielded no detectable mRNA, and no corresponding cDNAs have been identified [15
]. We carried out extensive RT-PCR using oligonucleotides corresponding to the sequence of two of these genes, C06E2.3 and C06E2.7, but no products were obtained. These results do not preclude the possibility that these genes are induced under special conditions, or that their mRNAs are particularly short-lived or rare. The DNA microarray data show, for example, very low message levels for ubc-1,
even though this gene is expressed throughout development [11
To date, DNA microarray data are available on C. elegans
gene expression profiles throughout development [15
], in males versus hermaphrodites [16
], and in germline versus soma [14
]. All of the ubc
genes with measurable transcript levels vary in expression through development, with mRNA levels always highest in the embryonic stages. In most cases, transcript levels drop in the early larval stages, then increase again in the fourth larval stage and in young adults. These results roughly parallel the amount of cell division taking place in the nematode, which is very high in the embryo, then decreases through the larval stages until the maturation of the gonad and commencement of oogenesis in the fourth larval stage. The available DNA microarray data suggest that most of the genes included in the chips do not show raised message levels in the germline or large differences in expression levels in oocytes versus sperm. However, levels of F49E12.4 message are significantly reduced in the male (the ratio in males versus hermaphrodites is 0.18 averaged over four experiments), whereas F29B9.6 and R01H2.6 mRNAs are enriched roughly twofold in oocytes compared to sperm [14
The C. elegans
UBC and UEV proteins were aligned with the predicted set of human and Drosophila
UBC proteins using the ClustalW program [17
]. The UBC sequences for the latter two species were obtained from FlyBase [18
] and from GeneCards [19
], with some additional human sequences obtained by a BLAST search of the published human genome using C. elegans
LET-70 (UBC-2) as the query sequence. These additional human proteins are identified by their GI (GeneInfo Identifier) numbers and the other human gene names follow the HUGO nomenclature [20
]. The NCUBE1 sequence was obtained from Lester et al.
] and is derived solely from cDNA sequence. Twenty-five Drosophila
proteins and 26 human proteins were included in the analysis, but additional human UBCs will probably be revealed when the genome sequence is fully assembled. Human and Drosophila
proteins were not included if they lacked the active-site cysteinyl residue. Alignment of the UBC and UEV proteins revealed some interesting differences in the region surrounding the active site. As shown in Figure , this region is demarcated by two invariant residues, a proline (P, in green) and a tryptophan (W, in yellow). The active-site cysteinyl residue (C, in red) is present in the UBC but not the UEV proteins. C. elegans
C06E2.7 has a cysteinyl residue in the active-site region that does not align well with the other UBCs. Functional studies will be required to determine if this protein is in fact a UBC. The alignment in Figure is separated into groups by horizontal lines which, along with the adjacent Roman numerals, denote branches on the phylogenetic tree described below.
Figure 1 ClustalW alignment of the region surrounding the active site of the C. elegans, Drosophila and human UBC proteins. The names of the proteins are color-coded: C. elegans (green), Drosophila (blue), and human (red). Subgroups of UBCs, separated by horizontal (more ...)
Many UBCs contain the tripeptide motif HPN (single-letter amino-acid nomenclature; Figure , in yellow), which is important for proper folding of the active-site region [1
]. Variations in the HPN tripeptide occur in several of the C. elegans ubc
genes. For example, B0403.2 has the sequence NPN, which is shared by two human E2s, BAB14320 and BAB14724 (Figure , group II, highlighted in blue). Group V UBCs have the sequence HCN (Figure , yellow). The most extreme variation in this region is seen in a group of four proteins that includes C. elegans
D1022.1 and Y110A2AR.2 as well as Drosophila
CG5823 and human NCUBE1. These proteins have the sequence T(P/A)NGR (Figure , top, group XVIII, blue letters), a variation that also occurs in S. cerevisiae
Ubc6p. Partly because of this difference, this subgroup has been referred to as non-canonical ubiquitin E2s - NCUBEs [21
]. The effect of such a variation on the structure of the active-site region is unknown.
Another striking difference among the predicted UBCs is a ten amino-acid insertion between the active-site cysteine and a highly conserved tryptophan in F58A4.10 (group XIII), Y71G12B.15 (group XIV-UBC3) and Y87G2A.9 (group XV-UBC7). This insertion is common to similar human and Drosophila
proteins, including human Cdc34 and Drosophila
Courtless. Other UBCs have smaller sequence insertions (or small deletions) in the same region. The accommodation of variable numbers of extra ammo-acid residues at this position is consistent with the three-dimensional structure of the UBC core domain [22
], as this region is expected to lie on the surface of the protein. Several UBCs, including C. elegans
B0403.2 and human BAB14724, have a smaller insertion on the amino-terminal side of the HPN motif.
A phylogenetic analysis was carried out on the C. elegans,
human and Drosophila
UBC proteins using the Phylip package of programs (see Materials and methods), setting C. elegans
Y69H2A.9 as the outlier of an unrooted tree. Y69H2A.9 is most similar to the mouse fused-toes (Ft1)
gene product [23
], being 36% identical in amino-acid sequence over 190 residues. Interestingly, the mouse Ft1
gene encodes a UEV, whereas Y69H2A.9 has an active-site cysteinyl residue. C. elegans
UEV-type proteins were included in the tree, but those from the Drosophila
and human proteomes were not. This analysis (Figure ) shows that most C. elegans
UBCs have orthologous Drosophila
or human proteins, or both. Notably, however, some branches on the tree contain only human and Drosophila
sequences. For example, human UBE2H10 has an ortholog in Drosophila
(CG10682) but not in C. elegans
(Figure , group XVII). UBE2H10 is involved in B-type cyclin degradation through its association with the anaphase-promoting complex, and dominant-negative mutants of UBE2H10, in which the active-site cysteine is changed to a seryl residue, arrest cells in M phase [24
]. The most closely related yeast protein, Ubc11p, is not a functional ortholog of the mammalian protein [25
]. B-type cyclins in C. elegans
] and in yeast must therefore be targeted by a different UBC class.
Figure 2 Phylogenetic relationship of the UBC proteins of C. elegans (green), Drosophila (blue), and human (red). The dendrogram was prepared using the Phylip package of programs as described in Materials and methods. The major branches are separated by dotted (more ...)
A second phylogenetic lineage that lacks a C. elegans
representative consists of the human proteins UBE2E1 and UBE2E3 together with three Drosophila
proteins (Figure , V). These E2s are all structurally related to the yeast Ubc4/5 type sequence (group IV), but differ in having a variant HCN tripeptide in the active-site region (see Figure , V) and by the presence of an amino-terminal extension that is rich in seryl residues. For example, nine of the first 20 residues of UBE2E3 are serines. UBE2E1 may interact with the HECT-domain family member E6-AP [27
], or with another HECT family protein, RSP5 [28
]. E6-AP, as part of a larger complex, mediates ubiquitylation of p53, while yeast Rsp5 and its mammalian counterpart Nedd4 mediate ubiquitylation of a variety of cell-surface proteins that are subsequently degraded in the lysosome. It remains to be determined if the serine-rich regions of group V E2s are involved in phosphorylation-mediated regulation of E2 function as suggested by Matuschewski et al.
]. As in the case of UBE2H10 described above, the function of UBCE2E1 and UBCE2E3 must be carried out by another UBC family member in C. elegans.
Most branches on Figure have at least one C. elegans
representative sequence. An interesting case that includes two C. elegans
UBCs occurs in branch XVIII, corresponding to yeast Ubc6p. Ubc6p has a transmembrane domain in its carboxy-terminal extension that anchors the protein in the membranes of the endoplasmic reticulum (ER). The anchored E2 functions in the ubiquitylation of misfolded proteins that are translocated back out of the ER [30
]. The only C. elegans
UBC containing a transmembrane domain in a carboxy-terminal extension is D1022.1. However, Y110A2AR.2 is closely related to D1022.1 but has only a short carboxy-terminal extension and lacks the membrane anchor. A unifying feature of the proteins in this group is the variant T(P/A)NGRF motif in the active-site region (Figure , XVIII). As mentioned above, the same variation is present in yeast Ubc6p. To date, C. elegans
Y110A2AR is the only member of this group lacking the membrane anchor.
The branch containing human HIP2 and Drosophila
UbcD4 (Figure , IX) also contains three C. elegans
proteins, two of which (C06E2.3 and C28G1.1) are encoded by genes not yet confirmed by cDNA sequence. Of these three C. elegans
proteins, only F40G9.3 contains a UBA, or ubiquitin-associated domain [31
]. This domain occurs in all the human and Drosophila
UBCs in the branch, although its significance remains unclear. The UBA domain also occurs in C06E2.7, which is perhaps more closely related to group VII proteins. The 340-residue extension of C28G1.1 is similar in sequence to the carboxy terminus of avian FAS-associated factor 1 (FAF1) which mediates apoptosis in L cells [32
Two groups in Figure contain type II UBC proteins with acidic carboxy-terminal extensions, implicated in target protein recognition and UBC protein dimerization. In group XI (UBC8-type) proteins, the number of acidic residues is lower than that in group XIV (UBC3-type) proteins. For example, Y94H6A.6 (UBC8-type) has an acidic domain consisting of 17 residues, nine of which are aspartyl or glutamy1 residues. Y71G12B.15 (UBC3-type), however, has a domain consisting of 32 residues, 20 of which are acidic. In yeast, Ubc3p/Cdc34p is involved in the ubiquitylation of several cell-cycle-related proteins including cyclin 2 (Cln2) and Cln3 [33
]. Other targets of Cdc34-mediated ubiquitylation have recently been discovered, including repressors of cyclin AMP-induced transcription [34
] and the oncoprotein B-Myb [35
], among others. Much less is known about the UBC8-type proteins, although it was recently shown that yeast Ubc8p regulates the ubiquitylation of the gluconeogenic enzyme fructose-1,6-bisphosphatase [36
]. C. elegans
UBC-1 (group XVI) also has an acidic carboxy-terminal extension [11
], but human and Drosophila
orthologs of this protein lack the acidic domain.