This study is focused on the characterization of the origin and evolutionary history of HECT ubiquitin ligases. The combination of sequence and structural analyses (Figures and ) allows establishing a natural classification of these proteins. This classification turns to be totally different from the one hitherto assumed. The division in sixteen subfamilies - groups of proteins that have very similar sequences and structures -- is much more precise than the classification into three groups proposed before [4
]. Moreover, I have found that at least two of the three groups defined in previous studies are not monophyletic, and therefore, not being real evolutionary units, should not be used for classification purposes. Therefore, this work provides a new evolutionary paradigm for the HECT family.
Twelve of the sixteen subfamilies generally group single orthologous genes in all animal species in which they are present, while the other four (NEDD4, UBE3B/3C, Large HERCs and Small HERCs) include several (up to nine genes for the NEDD4 subfamily in vertebrates). However, the results obtained also hint to broader classification patterns that put together members of different subfamilies (as indicated above: HACE1-HUWE1; UBE3A-HECTX-HECTD2-Small HERCs; TRIP12-HECTD1; KIAA0614-HECTD3-Large HERCs). Many of the genes and proteins of these subfamilies have been so far barely explored. Thus, to known of these cryptic relationships may be useful to design experiments based on what is understood for members of closely related subfamilies.
The precise analysis of the patterns of presence/absence in multiple model organisms allowed establishing the most parsimonious hypothesis for the evolution of HECT genes in animals (Figure ). Notably, most HECT subfamilies arose before the emergence of animals or very early in metazoan evolution. The number of novel subfamilies emerged later is very low, just two of them (G2E3 and HACE1) appeared after the chordate/echinoderm split (Figure ). I conclude that, since the origin of animals, HECT genes have generally been either maintained or, in some lineages such as insects, nematodes and urochordates, lost. Only vertebrates have a number of HECT genes much higher than the one that can be deduced for early animals, due to specific duplications of both the NEDD4 and the small HERC subfamily genes (Table ; Figure ). It is striking that all these patterns of diversification and streamlining are virtually identical, lineage by lineage, to the ones that I recently described for RBR ubiquitin ligases [14
]. This suggests the presence of underlying selective forces acting on the evolution of the animal ubiquitination system as a whole, sometimes leading to its simplification by the progressive loss of E3s. The deep reasons that explain the parallelism observed for RBR and HECT ubiquitin ligases are a mystery. To discover what controls the patterns of duplication, conservation and loss of E3 proteins is another promising line of research.
The comparison of the emergence of the HECT genes and the genes that encode known substrates of HECT proteins (Figures , and ) follows a complex pattern that cannot be simply explained by the typical "textbook" processes that we know often follow after gene duplications, such as neofunctionalization leading to the acquisition of new functions by one of the duplicates or subfunctionalization that divides the functions of the original gene between its duplicates. This is especially obvious for the members of the TGF-β signaling pathway regulated by NEDD4 subfamily HECTs (Figure ). I have found that there were four NEDD4 genes before animals emerged (Figure ). It is also well established that the TGF-β system is not present in choanoflagellates ([11
] and analyses presented above). Therefore, the simplest expectation would be that, in early animal history, a single NEDD4 protein was co-opted to regulate the novel signaling system. However, the current mammalian data shows that multiple, distantly related NEDD4 subfamily proteins are involved in the regulation of many (and often the same) proteins of the TGFβ pathway. This must be interpreted as evidence for multiple independent cooptions of HECT proteins to perform similar roles in the TGF-β pathway. The caveat that many of these results have been obtained in vitro
or by overexpressing the E3s in cell culture assays, and often through directed searches devised to test multiple related NEDD4 proteins (e. g. [26
]), must be acknowledged. It is therefore possible that not all the interactions described so far actually occur in whole organisms, However, if the pattern shown in Figure is basically correct (and the patterns shown in Figures and for other systems indeed have similar features), then it is a strong indication that enzymes that are part of complex families, with many members, may act on functionally related substrates in ways that do not follow any simple pattern and therefore may be largely unpredictable. This type of results indicates that there may be significant shortcomings in our current models of how duplicated genes differentiate, evolve new functions and are preserved in the genomes (see discussion in [27
]). Often, simplistic expectations are not fulfilled and what is really happening can only be understood by the combination of detailed phylogenetic analyses and functional data, as it has been shown here and also, recently, in other related studies (e. g. [28
At present, data for substrates of HECT proteins have largely restricted to members of the NEDD4 family. Out of 131 substrates that I have found in the literature, 92 were described for NEDD4 proteins (Additional file 3
). Therefore, it is still impossible to have a well-defined picture of all the different roles of the HECT E3s as a whole. However, the available results clearly point to a general involvement in the control of many key pathways that impinge on the regulation of gene expression [4
]. This is obvious from the data that I have shown in Figures , and , and it is reinforced by the large set of additional results showing that HECT E3s also regulate proteins that are either part of other signal transduction pathways (e. g. Notch, TrkA, Insulin-like growth factor, interleukin receptors, etc.) or directly involved in gene expression and its regulation (RNA pol II, histones, TopBP1, c-Jun, etc.). These results may contribute to explain one of the main features associated to this family, the fact that many of its members have been found in different ways to be associated to multiple types of cancer [7
]. Results shown here demonstrate that these members are not necessarily closely related, but often belong to different subfamilies. Further experimental results may contribute to clarify whether there is some subfamily specificity, in which members of different subfamilies have clearly distinct roles and primarily affect different cell types, or whether the roles of distant members of the HECT family may be indeed substantially overlapping, as the data currently available suggest. In this direction of future research, the discovery of some model species in which the number of HECT E3s has been largely reduced (e. g. Caenorhabditis elegans
, in which there are only 9 HECT genes; Table ) may be especially interesting, given that it may help to sort out more easily the roles of particular HECT ubiquitin ligases.