Reviewer 1: Andrei Osterman, Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, La Jolla, California, USA
An insightful and thorough study by Omelchenko et al. brings our attention to one of the most fascinating aspects of enzyme evolution, prolific existence of protein families encoding non-homologous isofunctional enzymes. Authors provided a new census of well-documented cases of such "analogous" enzymes revealing that this phenomenon is much more widespread than could have been expected in the early days of genomics. This new study was largely facilitated by advents of genome sequencing and structural genomics, which helped to correct some of the conclusions in their earlier analysis, especially with respect to distant homologs. A well-dosed combination of the elegantly designed automated analysis with manual case-by-case investigation allowed authors to generate a unique and highly useful dataset provided in the Supplementary Materials. By applying stringent criteria (distinct folds) Omelchenko et al
. concluded that at least 1/10 of all enzymes with presently assigned complete EC numbers could have emerged in evolution more than once. Such a high level of evolutionary redundancy is quite remarkable. Another notable conjecture based on the detailed analysis of this data is that the evolution of analogous enzymes appears to be largely driven by recruitment from distinct structural families (folds) featuring similar reactions. It provides another vivid illustration of the patchwork pathway evolution
hypothesis of R. Jensen [41
]. Abundance of non-homologous isozymes in prokaryotes was shown to correlate with the genome size and their distribution among various folds reflects functional versatility of popular folds (such as TIM barrel and Rossmann fold). This analysis sets a stage for further analysis of interrelationships between evolutionary redundancy and the types of catalyzed chemical reactions. Overall, this study contributes to our appreciation of the abundance of alternative solutions for the same or similar functional tasks that have emerged in course of evolution. In addition to its fundamental importance, this awareness as well as the captured specific knowledge would impact a number of applications in genomics (functional annotations and metabolic reconstruction), bioengineering (directed enzyme/pathway evolution) and drug discovery (identification of selective drug targets).
Authors' response: We thank the reviewer for these kind comments
"Analogous enzymes": outside of juxtaposition (analogous vs homologous) this term may be somewhat misleading. For example, if I'd hear that "these two enzymes are analogous" (outside of context of your paper title, which is helpful), I would think that the meaning is that these are two enzymes (homologous or not) catalyzing similar (analogous) but not identical reactions (e.g. glucokinase and mannokinase). What you mean in fact is "non-homologous isofunctional" enzymes (or "non-homologous isozymes"). You actually made a step towards better term in "analogous isoforms".
Authors' response: We agree. Actually, we found these comments so insightful and relevant that the phrase 'analogous enzymes' was replaced with 'non-homologous isofunctional enzymes' (NISE) throughout.
How did you deal with multifunctional/multidomain enzymes (such as RibF and such). I understand that they could be handled similarly but it might be worth mentioning in "methods"? Similar question, how do you deal with intrinsically multisubunit (heterooligimers) monofunctional enzymes? For example, how would you treat our newly discovered three-subunit L-LDH (former YkgEFG [64
]) vs LldD? The complexity is when the actual roles of subunits (as well as cofactors) are not yet clear (I guess you would skip us for the lack of EC number anyways?).
Authors' response: In the revised articles, we added to the Methods a sentence on handling multifunctional (and multidomain) enzymes that have been assigned two or more EC numbers. As for multi-subunit enzymes, these were removed from the automatically processed set but typically re-examined in the course of manual analysis. The new L-lactate dehydrogenase
]was missed because its EC 188.8.131.52 already had been listed among the NISE owing to the presence of the Rossmann-fold and the L-sulfolactate dehydrogenase-like fold proteins
You seem to be using quite a high hierarchical level (fold) to define "analogy". It is fine and safe. However, as far as I know popular folds (e.g. TIM) may be shared by proteins that are perceived to be evolutionary unrelated (non-homologous). This apparently reflects convergent evolution in the fold space (some folds simply emerge and stick with higher probability?). Is that true? If yes, then you may be underestmating a number of genuine analogous (in the evolutionary sense) pairs?
Authors' response: This is a very interesting and thorny point. Indeed, it is the case that distinct forms of the same fold, especially, in the case of versatile, abundant folds like the TIM barrel, are often considered non-homologous. From that perspective, by considering solely distinct folds, we might be underestimating the total number of non-homologous enzyme pairs. However, we are not sure that claims of convergent emergence of the same fold are valid. Of course, this is a fundamental issue in evolution of proteins that we would not attempt to solve in this paper which is dedicated to a different aspect of biochemical evolution. Moreover, even structures assigned to different folds might still be evolutionarily related (e.g.
]), which would lead to an overestimation of the number of 'truly non-homologous' enzymes. All in all, we believe that requirement that alternative enzyme isoforms had distinct folds to be considered non-homologous provides a reasonable and straightforward approach to the search for NISE. A more permissive approach to the identification of alternative enzyme isoforms has been recently used by others
"Likewise, two ubiquitin thiolesterases, ... actually possess distinct activities, cleaving polyubiquitin chains linked, respectively, to Lys-48 and Lys-63 residues [18
]." I either miss something or disagree. If, indeed, the only difference is the position of the polyubiquitinilated lysine in substrate protein, they should be considered analogous. Same deal with any enzymes involved in PTMs or processing of biopolymers, kinases, proteases and so forth. There is no straightforward way to encode their "site-specificity", therefore, in my opinion, for this type of analysis even trypsin and chymotrypsin should be considered as one: "serine endopeptidase of the chymotrypsin family".
Authors' response: The original language was indeed imprecise. The Lys-48 and Lys-63 residues are amino acid of ubiquitin not of the ubiquitinated protein substrate. The sentence is corrected to reflect this fact in the revised manuscript. The respective polyubiquitin chains are distinct molecules, so the two thiolesterases, probably, should not be in the list of NISE.
I notice that you have missed one of the FGGY kinases: RbtK - D-ribulokinase (EC 184.108.40.206). I disagree that "...Glycerol kinase ...catalyzes a reaction of lipid metabolism..." It is primarily catabolism of glycerol in bacteria. Just skip this statement or be more inclusive.
Authors' response: Corrected: we included EC 220.127.116.11 in the text (but not in the figure) and changed 'lipid metabolism' to 'glycerol metabolism'.
The sentence "It appears that in each case, gluconate kinase was recruited from a family of kinases with activities toward closely related substrates." is not incorrect but I feel that it puts emphasis in a wrong place. The key is that recruitment happens from families with the same type of chemical reaction (e.g. phosphorylation). Similar or dissimilar substrate is (a) an ambiguous notion (is adenylsulfate similar to gluconate?) and (b) not that important (glycerol and gluconate are distinct enough). How about referring to classic "patchwork hypothesis" of Jensen in this discussion? We actually provided a penny to it in our Science paper [67
Authors' response: We agree, corrected.
Discussion of patchy distribution is hard to appreciate without bringing up the issue of HGT. Have you examined patchy "analogous enzymes" (especially between Archaea and Bacteria) as a possible outcome of HGT?
Authors' response:We agree that the patchy distribution of analogous enzymes is most likely a consequence of rampant horizontal gene transfer
], and this is explicitly mentioned in the text:"Non-homologous isoforms of enzymes seem to be recruited from pre-existing enzymes with related activities and specificities following duplication or horizontal transfer (apparently, the principal route of innovation in prokaryotes
], where NISE are most common) of the respective genes." More specific and detailed analysis of the origins of NISE sets is of definite interest but beyond the scope of this paper
In the first paragraph of Discussion, you mention that you use "distinct folds (as) the ultimate proof of analogy", which is fine. However, you could mention that it might be another cause for underestimation of the extent of analogy among enzymes. Likewise, in addition to enzymes that "have not been biochemically characterized" there are also enzymes that were characterized but have not made it to EC nomenclature (or/and public databases like KEGG - just wonder whether you even thought of using SEED for metabolic enzymes, you could find a few interesting cases on top of what you have).
Authors' response: We have not used SEED in this work but hope to employ it in the next phase of this project. We have performed a literature search for potential cases of analogous enzymes but that search was not comprehensive.
"...recruited from pre-existing enzymes of related specificities..." - same comment. Not wrong but wrong emphasis. Chemistry (type of reaction) is clearly more important for recruitment than "substrate specificity". In the extreme case of "retrograde concept" one would expect glucose isomerase to be recruited from hexokinase family, which is not the main route.
Authors' response: We agree, changed to 'activities and specificities'.
"This has been recognized and successfully dealt with in the case of peptidases (EC 3.4.x.x. [50
]) but remains a problem for various protein kinases and protein phosphatases..." I disagree with this view and interpretation (I already expressed it about proteases), but I won't argue. I am sure that plurality of protein kinases and phosphatases is driven by other factors (including "simplicity" of reaction and high "demand" in regulatory networks).
Authors' response: We believe that the disagreement here, if any, is semantic rather than substantial. From the purely operational point of view, we are interested whether there are multiple unrelated isoforms that are capable of acting on the same substrate and performing the same biochemical reaction. We agree with the reviewer's view on the driving factors behind the observed plurality of kinases and phosphatases.
In Conclusions, it is important to choose words carefully to make the message clear. For example: "Sets of analogous, unrelated enzymes were detected for a substantial minority..." I would say at least "Sets of analogous, evolutionary unrelated enzymes (nonhomologous isoforms) were detected for a substantial fraction (up to 10%)".
Authors' response: Changed as suggested.
"...unrelated mechanistic solutions can evolve". Although this claim is not incorrect, it cannot be directly deduced from the existence of "analogous enzymes". As an example, chymotrypsins and subtilisins are both serine proteases (eg they run the same mechanism) while having distinct folds and evolutionary origin. I mean this claim would require a separate analysis of mechanisms. The only solid claim is that the same chemical solutions (with the same or distinct mechanisms) can evolve independently (functional, but not necessarily mechanistic) convergence.
Authors' response: We agree, changed to 'evolutionarily unrelated solutions'.
Reviewer 2: Keith F. Tipton, School of Biochemistry and Immunology, Trinity College, Dublin, Ireland (nominated by Martijn Huynen)
This a welcome update of the paper on analogous enzymes published by these authors in 1998. It contains much useful information and analysis. The supplementary Table S2, also available on-line, is particularly valuable. Some points that the authors should consider are listed below.
By concentrating on catalytic function in their discussions of evolutionary pressure, the authors may be missing the fact that an increasing number of enzymes are now recognized to be multifunctional (sometimes also called "moonlighting") proteins, with alternative, distinct, functions that may also be species-specific. Lists of several of these have been published (e.g., [68
]). This indicates that the evolutionary pressures may be more complicated. The authors might consider referring to such complexities, perhaps in the context of their statement that "the existence of analogy shows that, at least, for numerous and diverse biochemical problems, unrelated mechanistic solutions can evolve".
Authors' response: Moonlighting is a very interesting phenomenon that is, however, only tangentially related to the issue of NISE (analogous enzymes). The very definition of "moonlighting proteins" as those that "have two different functions within a single polypeptide chain"
]refers primarily to enzymes having additional non-enzymatic functions (e.g. transcriptional regulator, membrane receptor, growth factor, structural component, and so on). The above-cited reviews mention a single example of an enzyme with two entirely different enzymatic activities, the monomer of glyceraldehyde-3-phosphate dehydrogenase supposedly acting as uracil-DNA glycosylase
], which still remains controversial
]. In contrast, multifunctional enzymes
]usually turn out to consist of two or more different domains. In all these examples of "multitasking", the evolutionary constraints are very different from those encountered by non-homologous enzymes that evolved to catalyze the same biochemical reaction
Reviewer's comment to the authors' response: The problem of 'moonlighting' is surely that the evolutionary pressures on the alternative, non-enzymic, function(s) may be different from those on the catalytic function and thus cannot be ignored when considering the pressure on the catalytic function. Of course, much of the literature assumes that the catalytic function is the main one, but in some cases this may be doubtful.
There are also cases of catalytic promiscuity where an enzyme catalyses distinct types of reaction (see e.g., [75
]). If the reactions are sufficiently different, this should result in different EC numbers being assigned to the same protein. Furthermore, there are multifunctional proteins to catalysing different steps of an overall process, such as tryptophan synthase (EC 18.104.22.168) in some species. Thus, both 'one-to many' and 'many-to-one' relationships between EC numbers and proteins are possible. The former represents a problem, which the authors rightly point out, remains to be resolved for families such as the protein kinases, where a recognised enzyme, such as PKC-alpha may have several distinct substrates (see [76
]) and one protein substrate may be phosphorylated by more than one kinase.
Authors' response: Catalytic promiscuity, when alternative chemical reactions take place in essentially the same active site, is an important factor in enzyme evolution
]and references therein). As discussed above, we believe it to be a major source of NISE
As the authors recognise, the EC classification system is, or should be, solely based on the overall reaction catalysed. As such it is neither concerned with protein-sequence nor mechanistic differences and it is, perhaps, not surprising to find different proteins catalysing the same reaction. In this context, the suggestion "Nevertheless, it seems reasonable to consider expanding the EC system by officially recognizing the notion of a "class" within an EC node, such as, for example, superoxide dismutase (EC 22.214.171.124) class I, class II, and so on", might be clarified, since it would constitute a departure from the strict reaction-catalysed criterion and could risk detracting from its present utility. The authors should clarify what "classes" they propose should be included; would it be all analogous and homologous enzymes encompassed by each EC number? In some cases such material may be dealt with, more adequately, by complementary databases, which rely on the EC system. For enzymes that have different mechanism of action, the problem might best be resolved through systems such as the MaCiE (Mechanism, Annotation and Classification in Enzymes) database [79
] or its offshoot Metal MACiE [80
]. However, although MACiE does deal with the different mechanisms of the class I & II aldolases (EC 126.96.36.199), only the Cu/Zn superoxide dismutase is listed in these databases at present.
Authors' response: Adding the notion of a "class" to the EC system is only one of a number of possible ways to deal with NISE. Having supplementary specialized databases of enzyme mechanisms, such as MACiE
], or sequence-based profiles, such as PRIAM
]would be less intrusive but would force the users to rely on those outside sources for important information on the diversity of the enzymes in each EC node. This work identified NISE for almost 8% of all EC nodes, and many more EC nodes include divergent enzyme isoforms that still belong to the same superfamilies
]. Given the scope of the problem, we felt that it should be brought to the attention of Prof. Tipton and other members of the Enzyme Commission
The authors refer to the "strict reliance on substrate specificity" being "a cause for certain confusion when the EC numbers are applied to mapping reactions on the metabolic map" and give the example of the enzymes that could catalyse the oxidation of D-glucose. It is not clear why they regard this as a problem. Surely it is beneficial to be able to find all the enzymes that may contribute to a metabolic process? As, for example, in the approach adopted by Reaction Explorer [81
], and then to investigate the extents to which each does contribute to it, if at all?
Authors' response: Although we agree in principle, the decision on whether a certain pathway is operational in a certain organism often hinges on the presence or absence of a small group of pathway-specific enzymes
]. In such cases, non-critical application of EC numbers may lead researchers to an erroneous assertion of the presence - or absence - of a given reaction (and hence the whole pathway) in the given genome
A problem, which the authors touch upon, is that of broad-specificity enzymes, such as alcohol dehydrogenase (EC 188.8.131.52) and monoamine oxidase (EC 184.108.40.206), where the reaction is described in general terms, with little no indication of all the substrates that may be involved. Such information, where known, can be found in the BRENDA database [83
]. Similarly, the Enzyme List does not aim to give detailed species information, since that can also be found in the BRENDA database.
Authors' response: Although we agree, we have to note that this arrangement makes the BRENDA database the sole provider of this critically important information. In our opinion, the EC system might benefit from inclusion of this type of data.
Reviewer's comment to the authors' response: BRENDA is not the only source of specificity data and I did not intend to imply that it was. KEGG also gives such information. We collaborate closely with both databases, and take the view that if they are doing a good job, why should we want to duplicate them?
I am still not clear what you may have in mind by 'adding a class'. We have received many suggestions in the past for additional EC digit to cover several diverse areas, including mechanism, medically-relevant enzymes, enzymes from different species, isoenzymes etc. So far we have decided that this would not be helpful. The alternative might be adding a 'NISE' field to each entry but, as mentioned above, a direct link to the corresponding PRIAM page might be more helpful.
Reviewer 3: Igor B. Zhulin, University of Tennessee - Oak Ridge National Lab., Oak Ridge, Tennessee, USA
This paper extends the authors' previous work identifying analogous enzymes more than a decade ago. The authors expanded their search methods by utilizing both the Swiss Prot database and the KEGG database to better associate proteins with enzymatic activity. By the author's own admission, no strong trends were observed in the dataset, but they were able to identify a few very interesting patterns, including enrichment of analogous enzymes among glycoside hydrolases, enzymes involved in oxidative stress relief, and among the TIM Barrel and NAP(P)-binding Rossmann structural folds. As expected, the authors find that the number analogous enzymes scales with increasing genome size. The authors discuss the evolutionary origins of some of the trends noted above, as well as the limitations imposed on their identification schemes by the EC numbering system itself. Overall, I do like this paper a lot, especially because in my lab we have recently become interested in one particular family of analogous enzymes. So, I enjoyed looking at a bigger picture, while picturing our own work in its context.
The analysis scheme employed is straightforward and utilizes proven bioinformatic methodology. The authors appear to utilize conservative criteria for inclusion of data for the analysis, so the results are likely to under-predict rather than over-predict analogous enzymes. The results greatly expand the listing of analogous enzymes and the extensive supplementary material provides useful information for specialist interested in any particular family of enzymes.
The inclusion of numerous genomes through the use of the KEGG database allowed analysis of analogous enzymes to be conducted on a sufficient scale to give a fairly good approximation of the their relative abundance and the importance of analogous inventions during evolution. The coverage of structural information, sequence, and biological information seems to be such that the boundaries for the proportion of analogous enzymes (~10% of the EC nodes) seem unlikely to significantly change with future genome sequencing.
Lack of true novelty in this analysis is a minor quibble, as it generated a useful resource in and of itself and the specific cases highlighted are of interest in a number of fields. The use of EC number annotations may be suspect in some cases where the traditional sequence similarity based annotation methods are unreliable or where the EC definitions are inadequate. I can offer a couple of examples, where we happened to dig around a little bit. For instance, Table S1 lists a couple of cellulases (entries #78 and #91) in glycoside hydrolase families 10 and 11. It appears that there are no experimentally defined cellulases in these families, and enzymes shown are putative xylanases. It also might be just a matter of semantics, since these enzyme are likely to be hemicellulases (technically could be called cellulases, I guess). Anyway, the authors are fully aware of the limitations imposed and there is no way to verify available experimental evidence for each and every entry in such a large-scale effort. The vast majority of the enzymes included in the study are readily identified by sequence similarity based annotations, so the conclusions as a whole are sound.
Authors' response: We fully agree with these comments.