The writing of this review coincides with the 40th
anniversary of Crick’s seminal paper on the evolution of the genetic code (6
) that synthesized the preceding research in this area and presciently outlined the principal lines of thinking on this difficult subject. In our opinion, despite extensive and, in many cases, elaborate attempts to model code optimization, ingenious theorizing along the lines of the coevolution theory, and considerable experimentation, very little definitive progress has been made.
Of course, this does not mean there has been no advance in understanding aspects of the code evolution. Some clear conclusions are negative, i.e., allow one to rule out certain a priori
plausible possibilities. Thus, many years of experimentation including the latest extensive studies on aptamer selection show that the code is not based on a straightforward stereochemical correspondence between amino acids and their cognate codons (or anticodons). Direct interactions between amino acids and polynucleotides might have been important at some early stages of code’s evolution but hardly could have been the principal factor of the code’s evolution. Almost the same seems to apply to the coevolution theory: the possibility exists that evolution of amino acid metabolism and evolution of the code were, to some extent, linked, but this coevolution cannot fully explain the properties of the code. The verdict on the adaptive theory of code evolution, in particular, the hypothesis that the code was shaped by selection for error minimization, is different: in our view, this is the only concept of the code evolution that can legitimately claim to be positively relevant as (so far) no attempt to explain the observed robustness of the code to translation errors without invoking at least some extent of selection has been convincing. So it does appear that selection for translation error minimization played a substantial role in the evolution of the code to the standard form. However, there is also a flip side to the adaptive theory as the standard code appears not to be particularly outstanding in terms of error minimization and, apparently, easily reachable from a random code with the same block structure. Statements like “the genetic code is one in a million” (or even in 100 million) are technically accurate but can be easily misconstrued should one overlook the fact that there is a huge number of possible codes that are significantly more robust than the standard code that sits on the slope of an unremarkable local peak in an extremely rugged fitness landscape (). Of course, it cannot be ruled out that the fitness functions employed in modeling selection for error minimization (Eq (I)
and similar ones) in the evolution of the code are far from being an accurate representation of the “real” optimization criterion. Should that be the case, the general assessment of the entire field of code evolution would have to be particularly somber because that would imply we have no clue as to what is important in a code. However, this does not seem to be a particularly likely possibility. Indeed, recent theoretical and empirical studies on correlations between gene sequence evolution and expression strongly suggest that minimization of the production of potentially toxic misfolded proteins is a crucial factor of evolution (127
). It stands to reason that minimization of protein misfolding has driven evolution concordantly at several levels including protein sequences, codon usage (130
) and the genetic code itself. Furthermore, general considerations, stemming from Eigen’s theory of quasispecies and mutational meltdown, indicate that, for any complex life to evolve, sufficient robustness of replication and expression is a pre-requisite (131
). Thus, these more general lines of reasoning from evolutionary biology seem to complement the results of specific modeling of the code’s evolution.
And then, there is, of course, frozen accident, Crick’s famous “non-explanation” that, even after 40 years of increasingly sophisticated research, still appears relevant for the problem of the code’s origin and evolution. Indeed, given the relatively modest optimization level of the standard code, it appears essentially certain that the evolution of the code involved some combination of frozen accident with selection for error minimization. Whether or not other recognized and/or still unknown factors also contributed remains a matter to be addressed in further theoretical, modeling and experimental research.
Before closing this discussion, it makes sense to ask: do the analyses described here, focused on the properties and evolution of the code per se, have the potential to actually solve the enigma of the code’s origin? It appears that such potential is problematic because, out of necessity, to make the problems they address tractable, all studies of the code evolution are performed in formalized and, more or less, artificial settings (be it modeling under a defined set of code transformation or aptamer selection experiments) the relevance of which to the reality of primordial evolution is dubious at best. The hypothesis on the causal connection between the universality of the code and the collective character of primordial evolution characterized by extensive genetic exchange between ensembles of replicators (118
) is attractive and appears conceptually important because it takes the study of code evolution from being a purely formal exercise into a broader and more biologically meaningful context. Nevertheless, this proposal, even if quite plausible, is only one facet of a much more general and difficult problem, perhaps, the most formidable problem of all evolutionary biology. Indeed, it stands to reason that any scenario of the code origin and evolution will remain vacuous if not combined with understanding of the origin of the coding principle itself and the translation system that embodies it. At the heart of this problem is a dreary vicious circle: what would be the selective force behind the evolution of the extremely complex translation system before there were functional proteins? And, of course, there could be no proteins without a sufficiently effective translation system. A variety of hypotheses have been proposed in attempts to break the circle (see (132
) and references therein) but so far none of these seems to be sufficiently coherent or enjoys sufficient support to claim the status of a real theory.
It seems that detailed modeling of the code evolution from simpler predecessors such as doublet codes could offer some new windows into the early stages of the evolution of coding (72
). Notably, backtracking the standard code to the most likely doublet versions yields codes with an exceptional, nearly maximum error minimization capacity (ASN and EVK, unpublished), an observation that moves selection for error minimization and/or frozen accident at least one step closer to the actual origin of translation. Nevertheless, these and other theoretical approaches lack the ability to take the reconstruction of the evolutionary past beyond the complexity threshold that is required to yield functional proteins, and we must admit that concrete ways to cross that horizon are not currently known.
On the experimental front, findings on the catalytic capabilities of selected ribozymes are impressive (136
). In particular, highly efficient self-aminoacylating ribozymes and ribozymes that catalyze the peptidyltransferase reaction have been obtained (137
). Moreover, ribozymes whose catalytic activity is stimulated by peptides have been selected (139
), hinting at the possible origins of the RNA-protein connection (133
). Nevertheless, in a close analogy to the situation with theoretical approaches, we are unaware of any experiments that would have the potential to actually reconstruct the origin of coding, not even at the stage of serious planning.
Summarizing the state of the art in the study of the code evolution, we cannot escape considerable skepticism. It seems that the two-pronged fundamental question: “why is the genetic code the way it is and how did it come to be?”, that was asked over 50 years ago, at the dawn of molecular biology, might remain pertinent even in another 50 years. Our consolation is that we cannot think of a more fundamental problem in biology.