Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
IUBMB Life. Author manuscript; available in PMC 2012 March 5.
Published in final edited form as:
PMCID: PMC3293468

Origin and evolution of the genetic code: the universal enigma


The genetic code is nearly universal, and the arrangement of the codons in the standard codon table is highly non-random. The three main concepts on the origin and evolution of the code are the stereochemical theory, according to which codon assignments are dictated by physico-chemical affinity between amino acids and the cognate codons (anticodons); the coevolution theory, which posits that the code structure coevolved with amino acid biosynthesis pathways; and the error minimization theory under which selection to minimize the adverse effect of point mutations and translation errors was the principal factor of the code’s evolution. These theories are not mutually exclusive and are also compatible with the frozen accident hypothesis, i.e., the notion that the standard code might have no special properties but was fixed simply because all extant life forms share a common ancestor, with subsequent changes to the code, mostly, precluded by the deleterious effect of codon reassignment. Mathematical analysis of the structure and possible evolutionary trajectories of the code shows that it is highly robust to translational misreading but there are numerous more robust codes, so the standard code potentially could evolve from a random code via a short sequence of codon series reassignments. Thus, much of the evolution that led to the standard code could be a combination of frozen accident with selection for error minimization although contributions from coevolution of the code with metabolic pathways and weak affinities between amino acids and nucleotide triplets cannot be ruled out. However, such scenarios for the code evolution are based on formal schemes whose relevance to the actual primordial evolution is uncertain. A real understanding of the code origin and evolution is likely to be attainable only in conjunction with a credible scenario for the evolution of the coding principle itself and the translation system.


Shortly after the genetic code of Escherichia coli was deciphered (1), it was recognized that this particular mapping of 64 codons to 20 amino acids and two punctuation marks (start and stop signals) is shared, with relatively minor modifications, by all known life forms on earth (2, 3). Even a perfunctory inspection of the standard genetic code table (Fig. 1) shows that the arrangement of amino acid assignments is manifestly nonrandom (47). Generally, related codons (i.e., the codons that differ by only one nucleotide) tend to code for either the same or two related amino acids, i.e., amino acids that are physico-chemically similar (although there are no unambiguous criteria to define physicochemical similarity). The fundamental question is how these regularities of the standard code came into being, considering that there are more than 1084 possible alternative code tables if each of the 20 amino acids and the stop signal are to be assigned to at least one codon. More specifically, the question is, what kind of interplay of chemical constraints, historical accidents, and evolutionary forces could have produced the standard amino acid assignment, which displays many remarkable properties. The features of the code that seem to require a special explanation include, but are not limited to, the block structure of the code, which is thought to be a necessary condition for the code’s robustness with respect to point mutations, translational misreading, and translational frame shifts (8); the link between the second codon letter and the properties of the encoded amino acid so that codons with U in the second position correspond to hydrophobic amino acids (9, 10); the relationship between the second codon position and the class of aminoacyl-tRNA synthetase (11), the negative correlation between the molecular weight of an amino acid and the number of codons allocated to it (12, 13); the positive correlation between the number of synonymous codons for an amino acid and the frequency of the amino acid in proteins (14, 15); the apparent minimization of the likelihood of mistranslation and point mutations (16, 17); and the near optimality for allowing additional information within protein coding sequences (18).

Figure 1
The standard genetic code. The codon series are shaded in accordance with the polar requirement scale values (80), which is a measure of an amino acid’s hydrophobicity: the greater hydrophobicity the darker the shading (the stop codons are shaded ...

When considering the evolution of the genetic code, we proceed under several basic assumptions that are worth spelling out. It is assumed that there are only 4 nucleotides and 20 encoded amino acids (with the notable exception of selenocysteine and pyrrolysine, for which subsets of organisms have evolved special coding schemes (19), see also discussion below) and that each codon is a triplet of nucleotides. It has been argued that movement in increments of three nucleotides is a fundamental physical property of RNA translocation in the ribosome so that the translation system originated as a triplet-based machine (2022). Obviously, this does not rule out the possibility that, e.g., only two nucleotides in each codon are informative (see, e.g., (2326) for hypotheses on the evolution of the code through a “doublet” phase). Questions on why there are four standard nucleotides in the code (27, 28) or why the standard code encodes 20 amino acids (2931) are fully legitimate. Conceivably, theories on the early phases of the evolution of the code should be constrained by the minimal complexity that is required of a self-replicating system (e.g., (32)). However, this fascinating are of enquiry is beyond the scope of this review, and for the present discussion we adopt the above fundamental numbers as assumptions. With these premises, we here attempt to critically assess and synthesize the main lines of evidence and thinking about the code’s nature and evolution.

The code is evolvable

The code expansion theory proposed in Crick’s seminal paper posits that the actual allocation of amino acids to codons is mainly accidental and ‘yet related amino acids would be expected to have related codons’ (6). This concept is known as ‘frozen accident theory’ because Crick maintained, following the earlier argument of Hinegardner and Engelberg (2) that, after the primordial genetic code expanded to incorporate all 20 modern amino acids, any change in the code would result in multiple, simultaneous changes in protein sequences and, consequently, would be lethal, hence the universality of the code. Today, there is ample evidence that the standard code is not literally universal but is prone to significant modifications, albeit without change to its basic organization.

Since the discovery of codon reassignment in human mitochondrial genes (33), a variety of other deviations from the standard genetic code in bacteria, archaea, eukaryotic nuclear genomes and, especially, organellar genomes have been reported, with the latest census counting over 20 alternative codes (3438). All alternative codes are believed to be derived from the standard code (35); together with the observation that many of the same codons are reassigned (compared to the standard code) in independent lineages (e.g., the most frequent change is the reassignment of the stop codon UGA to tryptophan), this conclusion implies that there should be predisposition towards certain changes; at least one of these changes was reported to confer selective advantage (39).

The underlying mechanisms of codon reassignment typically include mutations in tRNA genes, where a single nucleotide substitution directly affects decoding (40), base modification (41), or RNA editing (42) (reviewed in (35)). Another pathway of code evolution is recruitment of non-standard amino acids. The discovery of the 21st amino acid, selenocysteine, and the intricate molecular machinery that is involved in the incorporation of selenocysteine into proteins (43) initially has been considered a proof that the current repertoire of amino acids is extremely hard to change. However, the subsequent discovery of the second non-canonical amino acid, pyrrolysine, and, importantly, the existence of a pyrrolysine-specific tRNA revealed additional malleability of the code (19, 44). In addition to the variations on the standard code discovered in organisms with minimized genomes, many experimental attempts on code modification and expansion have been reported (45). Recently, a general method has been developed to encode the incorporation of unnatural amino acids in genomes by recruiting either one of the stop codons or a subset of a codon series for a particular amino acid and engineering the cognate tRNA and aminoacyl-tRNA synthetase (46). The application of this methodology has already allowed incorporation in E. coli proteins of over 30 unnatural amino acids, in a striking demonstration of the potential malleability of the code (45, 46).

Three major theories have been suggested to explain the changes in the code. The ‘codon capture’ theory (47, 48) proposes that, under mutational pressure to decrease genomic GC-content, some GC-rich codons might disappear from the genome (particularly, a small, e.g., organellar, genome). Then, due to random genetic drift, these codons would reappear and would be reassigned as a result of mutations in non-cognate tRNAs. This mechanism is essentially neutral, i.e., codon reassignment would occur without generation of aberrant or non-functional proteins.

Another concept of code alteration is the ‘ambiguous intermediate’ theory which posits that codon reassignment occurs through an intermediate stage where a particular codon is ambiguously decoded by both the cognate tRNA and a mutant tRNA(49, 50). An outcome of such ambiguous decoding and the competition between the two tRNAs could be eventual elimination of the gene coding for the cognate tRNA and takeover of the codon by the mutant tRNA (37, 51). The same mechanism might also apply to reassignment of a stop codon to a sense codon, when a tRNA that recognizes a stop codon arises by mutation and captures the stop codon from the cognate release factor. Under the ambiguous intermediate hypothesis, a significant negative impact on the survival of the organism could be expected but the finding that the CUG codon (normally coding for leucine) in the fungus Candida zeylanoides is decoded as either leucine (3–5%) or serine (95–97%) gave credence to this scenario (37, 52).

Finally, evolutionary modifications of the code have been linked to ‘genome streamlining’ (53, 54). Under this hypothesis, the selective pressure to minimize mitochondrial genomes yields reassignments of specific codons, in particular, one of the three stop codons.

The three theories explaining codon reassignment are not exclusive considering that the ‘ambiguous intermediate’ stage can be preceded by a significant decrease in the content of GC-rich codons, so that codon reassignment might be driven by a combination of evolutionary mechanisms (55), often under the pressure for genome minimization, especially, in organellar genomes and small genomes of parasitic bacteria such as mycoplasmas (38, 54, 56, 57).

The basic theories of the code nature, origin and evolution

The existence of variant codes and the success of experiments on the incorporation of unnatural amino acids briefly discussed in the preceding section indicates that the genetic code has a degree of evolvability. However, all these deviations involve only a few codons, so in its main features, the structure of the code seems not to have changed through the entire history of life or, more precisely, at least, since the time of the Last Universal Common Ancestor (LUCA) of all modern (cellular) life forms. This universality of the genetic code and the manifest non-randomness of its structure cry for an explanation(s). Of course, Crick’s frozen accident/code expansion theory can be considered a default explanation that does not require any special mechanisms and is only predicated on the existence of a LUCA with a an advanced translation system resembling the modern one (that is, the implicit assumption is that LUCA was not a “progenote” with primitive, very inaccurate translation (58)). However, this explanation is often considered unsatisfactory, first, on the most general, epistemological grounds, because it is, in a sense, a non-explanation, and second, because the existence of variant codes and the additional, experimentally revealed flexibility of the code (see above) present a challenge to the frozen-accident view. Indeed, the fact that there seem to be ways to “sneak in” changes to the standard code, and yet, the same limited modifications seem to have evolved independently in diverse lineages suggests that the code structure could be non-accidental. Three, not necessarily mutually exclusive main theories have been proposed in attempts to attribute the pattern of amino acid assignments in the standard genetic code to physico-chemical or biological factors or a combination thereof. Rather remarkably, the central ideas of each of these theories have been formulated during the classic age of molecular biology, not long after the code was deciphered or even earlier, and despite numerous subsequent developments, remain relevant to this day. We first briefly outline the three theories in their respective historical contexts and then discuss the current status of each.

  1. The stereochemical theory asserts that the codon assignments for particular amino acids are determined by a physicochemical affinity that exists between the amino acids and the cognate nucleotide triplets (codons or anticodons). Thus, under this class of models, the specific structure of the code is not at all accidental but, rather, necessary and, possibly, unique. The first stereochemical model was developed by Gamow in 1954, almost immediately after the structure of DNA has been resolved and, effectively, along with the idea of the code itself (59). Gamow proposed an explicit mechanism to relate amino acids and rhomb-shaped ‘holes’ formed by various nucleotides in DNA. Subsequently, after the code was deciphered, more realistic stereochemical models have been proposed (6062) but were generally deemed improbable due to the failure of direct experiments to identify specific interactions between amino acids and cognate triplets (5, 6). Nevertheless, the inherent attractiveness of the stereochemical theory which, if valid, makes it much easier to see how the code evolution started, stimulated further experimental and theoretical activity in this area.
  2. The adaptive theory of the code evolution postulates that the structure of the genetic code was shaped under selective forces that made the code maximally robust, i.e., minimize the effect of errors on the structure and function of the synthesized proteins. It is possible to distinguish the ‘lethal-mutation’ hypothesis (63, 64) under which the standard code evolved to minimize the effect of point mutations and the ‘translation-error minimization’ hypothesis (65, 66) which posits that the most important pressure in the code’s evolution was selection for minimization of the effect of the translational misreadings.
    A combination of the two types of forces is conceivable as well. The fact that related codons code for similar amino acids and the experimental observations that mistranslation occurs more frequently in the first and third positions of codons whereas it is the second position that correlates best with amino acid properties were construed as evidence in support of the adaptive theory (65, 67, 68). The translation-error minimization hypothesis also received some statistical support from Monte Carlo simulations (69), which later became a major tool to analyze the degree of optimization of the standard code.
  3. The coevolution theory posits that the structure of the standard code reflects the pathways of amino acid biosynthesis (70). According to this scenario, the code coevolved with the amino acid biosynthetic pathways, i.e., during the code evolution, subsets of codons for precursor amino acids have been reassigned to encode product amino acids. Although the basic idea of the coevolution hypothesis is the same as in Crick’s scenario of code extension, the explicit identification of precursor-product pairs of amino acids and strong statistical support for the inferred precursor-product pairs (70, 71) gained the coevolution theory wide acceptance.
    A complementary approach to the problem of code evolution espouses a “tRNA-centric” view under which the features of the code are determined by different types of co-evolution, namely, that of the codons and the cognate tRNA anticodons (51) or of the codons and aminoacyl-tRNA synthetases (72). This coevolution has been interpreted, primarily, in terms of minimization of the rate and effect of translation errors (51) or with respect to the reduction of coding ambiguity at the early stages of the code evolution (72).

The stereochemical theory: tantalizing hints but no conclusive evidence

Extensive early experimentation has detected, at best, weak and relatively non-specific interactions between amino acids and their cognate triplets (5, 73, 74). Nevertheless, it is not unreasonable to argue that even a relatively weak, moderately selective affinity between codons (anticodons) and the cognate amino acids could have been sufficient to precipitate the emergence of the primordial code that subsequently evolved into the modern code in which the specificity is maintained by much more precise and elaborate, indirect mechanisms involving tRNAs and aminoacyl-tRNA synthetases. Furthermore, it can be argued that interaction between amino acids and triplets are strong enough for detection only within the context of specific RNA structures that ensure the proper conformation of the triplet; this could be the cause of the failure of straightforward experiments with trinucleotides or the corresponding polynucleotides. Indeed, the modern version of the stereochemical theory, the ‘escaped triplet theory’ posits that the primordial code functioned through interactions between amino acids and cognate triplets that resided within amino-acid-binding RNA molecules (75). The experimental observations underlying this theory are that short RNA molecules (aptamers) selected from random sequence mixtures by amino-acid-binding were significantly enriched with cognate triplets for the respective amino acids (76, 77). Among the 8 tested amino acids (phenylalanine, isoleucine, histidine, leucine, glutamine, arginine, tryptophan, and tyrosin) (75), only glutamine showed no correlation between the codon and the selected aptamers. The straightforward statistical test applied in these analyses indicated that the probability to obtain the observed correlation between the codons and the sequences of the selected aptamers due to chance was extremely low; the most convincing results were seen for arginine (75). However, more conservative statistical procedures (applied to earlier aptamer data) suggest that the aptamer-codon correlation could be a statistical artifact (78) (but see (79)).

A different kind of statistical analysis has been employed to calculate how unusual is the standard code, given the aptamer-amino-acid binding data (75, 77). A comparison of the standard code with random alternatives has shown that only a tiny fraction of random codes displayed a stronger correlation with the aptamer selection data than the standard code (the real genetic code has greater codon association than 90.3% random codes, and greater anticodon association than 99.8 random codes). The premises of this calculation can be disputed, however, because the standard code has a highly non-random structure, and one could argue that only comparison with codes of similar structures are relevant, in which case the results of aptamer selection might not come out as being significant.

On the whole, it appears that the aptamer experiments, although suggestive, fail to clinch the case for the stereochemical theory of the code. As noticed above, the affinities are rather weak, so that even the conclusions on their reality hinge on the adopted statistical models. Even more disturbing, for different amino acids, the aptamers show enrichment for either codon or anticodon sequence or even for both (75), a lack of coherence that is hard to reconcile with these interactions being the physical basis of the code.

The adaptive theory: evidence of evolutionary optimization of the code

Quantitative evidence in support of the translation-error minimization hypothesis has been inferred from comparison of the standard code with random alternative codes. For any code its cost can be calculated using the following formula:


where a(c) : CA is a given code, i.e., mapping of 64 codons c [set membership] C to 20 amino acids and stop signal a(c) [set membership] A ; p(c′ | c) is the relative probability to misread codon c as codon c′, and d(a(c′), a(c)) is the cost associated with the exchange of the cognate amino acid a(c) with the misincorporated amino acid a(c′). Under this approach, the less the cost ϕ(a(c)) the more robust the code is with respect to mistranslations, i.e., the greater the code’s fitness.

The first reasonably reliable numerical estimates of the fraction of random codes that are more robust than the standard code have been obtained by Haig and Hurst (16) who showed that, under the assumption that any misreadings between two codons that differ by one nucleotide are equally probable, and if the polar requirement scale (80) is employed as the measure of physicochemical similarity of amino acids, the probability of a random code to be fitter than the standard one is P1 ≈ 10−4. Using a refined cost function that took into account the non-uniformity of codon positions and base-dependent transition bias, Freeland and Hurst have shown that the fraction of random codes that outperforms the standard one is P2 ≈ 10−6, i.e., ‘the genetic code is one in a million’ (81). Subsequent analyses have yielded even higher estimates of error minimization of the standard code (15, 17, 82, 83).

Despite the convincing demonstration of the high robustness to misreadings of the standard code, the translation-error minimization hypothesis seems to have some inherent problems. First, to obtain any estimate of a code’s robustness, it is necessary to specify the exact form of the cost function (I) that, even in its simplest form, consists of a specific matrix of codon misreading probabilities and specific costs associated with the amino acid substitutions. The form of the matrix p(c′| c) proposed by Freeland et al. (81) is widely used (e.g., (15, 8386)) but the supporting data are scarce. In particular, it has been convincingly shown that mistranslation in the first and third codon positions is more common than in the second position (65, 87, 88), but the transitional biased misreading in the second position is hard to justify from the available data. In part, to overcome this problem, Ardell and Sella formulated the first population-genetic model of code evolution where the changes in genomic content of a population are modeled along with the code changes (8991). This approach is a generalization of the adaptive concept of code evolution that unifies the lethal-mutation and translation-error minimization hypotheses and incorporates the well-known fact that, among mutations, transitions are far more frequent than transversions (92, 93). Essentially, the Ardell-Sella model describes coevolution of a code with genes that utilize it to produce proteins and explicitly takes into account the “freezing effect” of genes on a code that is due to the massive deleterious effect of code changes (90). Under this model, evolving codes tend to “freeze” in structures similar to that of the standard code and having similar levels of robustness.

Another problem with the function (I) is that it relies on a measure of physicochemical similarity of amino acids. It is clear that any one such measure cannot be totally adequate. The amino acid substitution matrices such as PAM that are commonly used for amino acid sequence comparison appear not to be suitable for the study of the code evolution because these matrices have been derived from comparison of protein sequences that are encoded by the standard code, and hence cannot be independent of that code (94). Therefore one must use a code-independent matrix derived from a first-principle comparison of physic-chemical properties of amino acids, such as the polar requirement scale (80). However, the number of possible matrices of this kind is enormous, and there are no clear criteria for choosing the “best” one. Thus, arbitrariness is inherent in the matrix selection, and its effect on the conclusions on the level of optimization of a code is hard to assess.

A potentially serious objection to the error-minimization hypothesis (95) is that, although the estimates of P1 and P2 indicate that the standard code outperforms most random alternatives, the number of possible codes that are fitter (more robust) than the standard one is still huge (it should be noted that estimates of the code robustness rely on the employed randomization procedure; the one most frequently used involves shuffling of amino acid assignments between the synonymous codon series that are intrinsic to the standard code, so that 20! ≈ 2.4 · 1018 possible codes are searched; different random code generators can produce substantially different results (86)). It has been suggested that, if selection for minimization of translation error effect was the principal force of code evolution, the relative optimization level for the standard code would be significantly higher than observed (96). The counter argument offered by supporters of the error-minimization hypothesis is that the distribution of random code costs is bell-shaped, where more robust codes form a long tail, so because the process of adaptation is non-linear, approaching the absolute minimum is highly improbable (17).

It has been suggested that the apparent code robustness could be a by-product of evolution that was driven by selective forces that have nothing to do with error minimization (97). Specifically, it has been shown that the non-random assignments of amino acids in the standard code can be almost completely explained by incremental code evolution by codon capture or ambiguity reduction processes. However, this conclusion relies on the exact order of amino acids recruitment to the genetic code (98, 99), primarily, on a specific interpretation of the evolution of biosynthetic pathways for amino acids, which remains a controversial issue.

What is the level of code optimization and how could the code get there?

Regardless of the exact nature of the selective forces that had the greatest effect on the evolution of the code, it is a fact that the standard code is substantially robust to translational misreadings as well as mutations. Thus, is seems to be of considerable importance to determine, as objectively as possible, the level of the code’s optimization. Intriguing questions associated with this problem are how much evolution the standard code underwent and what would be the most likely starting point for such evolution.

Estimates on the total level of code optimization have a long history. The straightforward comparison can be made between the standard code and the most robust code with respect to the mean cost value of random codes. This measure of the optimization level was dubbed the minimization percentage (100, 101); more precisely, MP = (ϕmeanϕstand )/(ϕmeanϕmin), where ϕmean is the mean cost of random codes, ϕstand is the cost of the standard code, ϕmin is the cost of the most optimal code [all values are calculated given a particular cost function of the form (I)]. The minimization percentage of the standard code has been estimated at ~70% when the polar requirement scale is used as the measure of amino acid exchangeability (96, 101). Figure 2 shows an example of a code that was optimized for robustness to translation errors by swapping codon assignments for amino acids to minimize the value of the cost function given by formula (I). With respect to this code, the minimization percentage of the standard code is 78% (this MP value is somewhat higher than those reported by Di Giulio (96) because a more realistic misreading matrix p(c′ | c) was employed).

Figure 2
An optimized genetic code with the same block structure and degeneracy as the standard code obtained as a result of combinatorial optimization of the amino acid assignments to four- and two-codon series. The optimization was performed by using the Great ...

Recently, we explored possible evolutionary trajectories of the genetic code within a limited domain of the vast space of possible codes (only codes that possess the same block structure and the same level of degeneracy as the standard code were analyzed) (86). The assumption behind the choice of this small part of the vast code space is that, at an early stage of the evolution of the code, its block structure was fixed (“froze”) in the current form that could not be changed without a dramatic deleterious effect (a notion that is obviously related to Crick’s frozen accident). Thus, we employed a straightforward, greedy evolutionary algorithm, with elementary steps comprising swaps of amino acid assignments between four-codon or two-codon series, to investigate the level of code optimization. The properties of the standard code were compared with the properties of four sets of random codes (purely random codes, random codes whose robustness is greater than that of the standard code, and two sets of codes that resulted from optimization of the first two sets). Under this model, the code fitness landscape is extremely rugged, so that almost any random code yields its own local maximum. Rather unexpectedly, starting from a random code, the level of optimization of the standard code can be easily achieved with 10–12 evolutionary steps on average, and often, optimization can be continued to reach the level that is attainable when the optimization starts from the standard code. When the starting point is a random code that is more robust than the standard one, the optimization procedure yields much higher levels of optimization than that reachable from the standard code, i.e., the standard code is much closer to its local fitness peak than most of the random codes with similar levels of robustness. Comparison of the standard code with the four described sets of codes shows that the standard code is very close to the set of optimized random codes. Thus, the standard genetic code appears to be a point that is located about half way (measured in the number of codon series swaps) along an upward evolutionary trajectory from a random code to the summit of the respective local peak. Moreover, this peak is rather mediocre, with a huge number of taller peaks existing in the landscape (Fig. 3). It should be emphasized that, under this model, the standard code is not locally stable, that is, it can be readily “improved” by a small perturbation (an additional swap). Thus, under the assumption that the function (I) is an adequate measure of the code fitness, it is hard to attribute the lack of further optimization of the standard code to anything other than frozen accident.

Fig. 3
Evolution of codes in a rugged fitness landscape (a cartoon illustration).

Coevolution theory: a link between the code and amino acid metabolism?

The coevolution theory (reviewed in (71, 102, 103)) postulates that prebiotic synthesis could not produce 20 modern amino acids, so a subset of the amino acids had to be produced through biosynthetic pathways before they could be co-opted into the genetic code and translation; hence coevolution of the code and amino acid metabolism (104). Therefore codon allocations to amino acids could have been guided by metabolic connections between the amino acids. According to the coevolution theory, there were three main phases of amino acid entry into the genetic code: the first (phase 1) amino acids came from prebiotic synthesis, phase 2 amino acids entered the code by means of biosynthesis from the phase 1 amino acids, and phase 3 amino acids are introduced into proteins through post-translational modifications (105). The particular choice of phase 1 amino acids (Fig. 4) is supported by a survey of a variety of criteria used to infer the likely order of amino acid appearance (98) (with one exception), and by the list of amino acids produced by high energy proton irradiation of a carbon monoxide-nitrogen-water mixture (106). Under the coevolution theory, evolution of metabolic pathways is an important source of new amino acids. Given the precursor-product pairs of amino acids, the allocation of amino acids in the standard code is almost impossible to obtain by chance (Fig. 4). Experiments demonstrating that the amino acid composition of proteins is evolvable are construed as supporting the coevolution theory. For instance, it has been shown that Bacillus subtilis could be mutated to replace its Tryptophan by 4-fluoroTrp, and even further to displace Trp completely (107).

Fig. 4
The expansion of the standard code according to the coevolution theory. Phase 1 amino acids are orange, and phase 2 amino acids are green. The numbers show the order of amino acid appearance in the code according to (99). The arrows define 13 precursor-product ...

Two major criticisms of the coevolution theory have been put forward. First, the coevolution scenario is very sensitive to the choice of amino acid precursor-product pairs, and the choice of these pairs is far from being straightforward. Indeed, in the original formulation of the coevolution theory, Wong did not directly use biochemically established relationships between amino acids but instead employed inferred reactions of primordial metabolism that remain debatable (70, 103). Amirnovin (108) generated a large set of random codes and found that, if the original 8 precursor-product pairs proposed by Wong (70) are considered, the standard code shows a substantially higher codon correlation score (a measure that calculates number of adjacent codons coding for precursor-product amino acids) than most of the random codes (only 0.1% of random codes perform better). However, after the pairs Gln-His and Val-Leu are removed (the validity of the latter pair has been questioned (109)), the proportion of better random codes rises to 3.6%, and if the precursor-product pairs are taken from the well-characterized metabolic pathways of E. coli, the proportion that a random code shows a stronger correlation reaches 34%. Second, the biological validity of the statistical analysis of Wong (70) appears dubious (109). Ronneberg et al., together with consistent definition of amino acid precursor-product pairs, suggested that, according to the wobble rule, the genetic code contains not 61 functional codons coding for amino acids, but 45 codons, where each two codons of the form NNY are considered as one because no known tRNA can distinguish codons with U or C in the third base position. Under this assumption, there was no statistical support for the coevolution scenario of the evolution of the code (109) (but see (110)).

Is a compromise scenario plausible?

As discussed above, despite a long history of research and accumulation of considerable circumstantial evidence, none of the three major theories on the nature and evolution of the genetic code is unequivocally supported by the currently available data. It appears premature to claim, e.g., that ‘the coevolution theory is a proven theory’ (103), or ‘There is very significant evidence that cognate codons and/or anticodons are unexpectedly frequent in RNA-binding sites […]. This suggests that a substantial fraction of the genetic code has a stereochemical basis’ (75). Is it conceivable that each of these theories captures some aspects of the code’s origin and evolution, and combined, they could yield a more realistic picture? In principle, it is not difficult to speculate along these lines, for instance, by imagining a scenario whereby first abiogenically synthesized amino acids captured their cognate codons owing to their respective stereochemical affinities, after which the code expanded according to the coevolution theory, and finally, amino acid assignments were adjusted under selection to minimize the effect of translational misreadings and point mutations on the genome. Such a composite theory is extremely flexible and consequently can “explain” just about anything by optimizing the relative contributions of different processes to fit the structure of the standard code. Of course, the falsifiability or, more generally, testability of such an overadjusted scenario become issues of concern. Nevertheless, examination of the specific predictions of each theory might take one some way toward falsification of the composite scenario.

The coevolution scenario implies that the genetic code should be highly robust to mistranslations, simply, because the identified precursor-product pairs consist of physico-chemically similar amino acids (97). However, several detailed analyses have suggested that coevolution alone cannot explain the observed level of robustness of the standard code so that additional evolution under selection for error minimization would be necessary to arrive to the standard code (82, 85, 111). Thus, in terms of the plausibility of a composite scenario, coevolution and error minimization are compatible. However, error minimization also appears to be necessary whereas the necessity of coevolution remains uncertain.

The affinities between cognate triplets and amino acids detected in aptamer selection experiments appear to be independent of the highly optimized amino acid assignments in the standard code table (112). Thus, even if these affinities are relevant for the origin of the code, the error minimization properties of the standard code are still in need of an explanation. The proponents of the stereochemical theory argue that some of the amino acid assignments are stereochemically defined, whereas others have evolved under selective pressure for error minimization, resulting in the observed robustness of the standard code. Indeed, it has been shown that, even when 8–10 amino acid assignments in the standard code table are fixed, there is still plenty of room to produce highly optimized genetic codes (112). However, this mixed stereochemistry-selection scenario seems to clash with some evidence. Perhaps, rather paradoxically, amino acids for which affinities with cognate triplets have been reported, largely, are considered to be late additions to the code: only 4 of the 8 amino acids with reported stereochemical affinities are phase 1 amino acids according to the coevolution theory (Fig. 4). Notably, arginine, the amino acid for which the evidence in support of a stereochemical association with cognate codons appears to be the strongest, is the “worst positioned” amino acid in the code table, i.e., of all amino acids, a change in the codon assignment for arginine results in the greatest increase in the code’s fitness (e.g., (86)). This unusual position of arginine in the code table makes it tempting to consider a different combined scenario of the code’s evolution whereby the early stage of this evolution involved, primarily, selection for error minimization, whereas at a later stage, the code was modified through recruitment of new amino acids that involved the (weak) stereochemical affinities.

Universality of the genetic code and collective evolution

Whether the code reflects biosynthetic pathways according to the coevolution theory or was shaped by adaptive evolutionary forces to minimize the burden caused by improper translated proteins or even to maximize the rate of the adaptive evolution of proteins (113115), a fundamental but often overlooked question is why the code is (almost) universal. Of course, the stereochemical theory, in principle, could offer a simple solution, namely, that the codon assignments in the standard code are unequivocally dictated by the specific affinity between amino acids and their cognate codons. As noticed above, however, the affinities are equivocal and weak, and do not account for the error-minimization property of the code. An alternative could be that the code evolved to (near) perfection in terms of robustness to translational errors or, perhaps, some other optimization criteria, and this (nearly) perfect standard code outcompeted all other versions. We have seen, however, that, at least with respect to error minimization, this is far from being the case (Fig. 3). What remains as an explanation of the code’s universality is some version of frozen accident combined with selection that brought the code to a relatively high robustness that was sufficient for the evolution of complex life.

Under the frozen accident view, the universality of the code can be considered an epiphenomenon of the existence of a unique LUCA. The LUCA must have had a code with at least a minimal fitness compatible with cellular life, and that code was frozen ever since (except for the observed limited variation). The implicit assumption behind this line of reasoning is that LUCA already possessed a translation system that was (nearly) as advanced as the modern version. Indeed, the universality of the key components of the translation system including a nearly complete set of aminoacyl-tRNA synthetases among the extant cellular life forms (116, 117) strongly suggests that the main features of the translation system were fixed at a pre-LUCA stage of evolution.

The recently proposed hypothesis of collective evolution of primordial replicators explains the universality of the code through a combination of froze accident and a distinct type of selection pressure (118, 119). The central idea is that universality of the genetic code is a condition for maintaining the (horizontal) flow of genetic information between communities of primordial replicators, and this information flow is a condition for the evolution of any complex biological entities. Horizontal transfer of replicators would provide the means for the emergence of clusters of similar codes, and these clusters would compete for niches. This idea of collective evolution of ensembles of virus-like genetic entities as a stage in the origin of cellular life apparently goes back to Haldane’s classic paper of 1928 (120) but was subsequently recast in modern terms and expanded (121124), and developed in physical terms (125, 126). Vetsigian et al. (118) explored the fate of the code under collective evolution using a simple evolutionary model which is a generalization of the population-genetic model of code evolution described by Sella and Ardell (90, 91). It has been shown that, taking into consideration the selective advantage of error-minimizing codes, within a community of subpopulations of genetic elements capable of horizontal gene exchange, evolution leads to a nearly universal, highly robust code (118).

Instead of conclusions: How did the code evolve (and will we ever know)?

The writing of this review coincides with the 40th anniversary of Crick’s seminal paper on the evolution of the genetic code (6) that synthesized the preceding research in this area and presciently outlined the principal lines of thinking on this difficult subject. In our opinion, despite extensive and, in many cases, elaborate attempts to model code optimization, ingenious theorizing along the lines of the coevolution theory, and considerable experimentation, very little definitive progress has been made.

Of course, this does not mean there has been no advance in understanding aspects of the code evolution. Some clear conclusions are negative, i.e., allow one to rule out certain a priori plausible possibilities. Thus, many years of experimentation including the latest extensive studies on aptamer selection show that the code is not based on a straightforward stereochemical correspondence between amino acids and their cognate codons (or anticodons). Direct interactions between amino acids and polynucleotides might have been important at some early stages of code’s evolution but hardly could have been the principal factor of the code’s evolution. Almost the same seems to apply to the coevolution theory: the possibility exists that evolution of amino acid metabolism and evolution of the code were, to some extent, linked, but this coevolution cannot fully explain the properties of the code. The verdict on the adaptive theory of code evolution, in particular, the hypothesis that the code was shaped by selection for error minimization, is different: in our view, this is the only concept of the code evolution that can legitimately claim to be positively relevant as (so far) no attempt to explain the observed robustness of the code to translation errors without invoking at least some extent of selection has been convincing. So it does appear that selection for translation error minimization played a substantial role in the evolution of the code to the standard form. However, there is also a flip side to the adaptive theory as the standard code appears not to be particularly outstanding in terms of error minimization and, apparently, easily reachable from a random code with the same block structure. Statements like “the genetic code is one in a million” (or even in 100 million) are technically accurate but can be easily misconstrued should one overlook the fact that there is a huge number of possible codes that are significantly more robust than the standard code that sits on the slope of an unremarkable local peak in an extremely rugged fitness landscape (Fig. 3). Of course, it cannot be ruled out that the fitness functions employed in modeling selection for error minimization (Eq (I) and similar ones) in the evolution of the code are far from being an accurate representation of the “real” optimization criterion. Should that be the case, the general assessment of the entire field of code evolution would have to be particularly somber because that would imply we have no clue as to what is important in a code. However, this does not seem to be a particularly likely possibility. Indeed, recent theoretical and empirical studies on correlations between gene sequence evolution and expression strongly suggest that minimization of the production of potentially toxic misfolded proteins is a crucial factor of evolution (127130). It stands to reason that minimization of protein misfolding has driven evolution concordantly at several levels including protein sequences, codon usage (130) and the genetic code itself. Furthermore, general considerations, stemming from Eigen’s theory of quasispecies and mutational meltdown, indicate that, for any complex life to evolve, sufficient robustness of replication and expression is a pre-requisite (131133). Thus, these more general lines of reasoning from evolutionary biology seem to complement the results of specific modeling of the code’s evolution.

And then, there is, of course, frozen accident, Crick’s famous “non-explanation” that, even after 40 years of increasingly sophisticated research, still appears relevant for the problem of the code’s origin and evolution. Indeed, given the relatively modest optimization level of the standard code, it appears essentially certain that the evolution of the code involved some combination of frozen accident with selection for error minimization. Whether or not other recognized and/or still unknown factors also contributed remains a matter to be addressed in further theoretical, modeling and experimental research.

Before closing this discussion, it makes sense to ask: do the analyses described here, focused on the properties and evolution of the code per se, have the potential to actually solve the enigma of the code’s origin? It appears that such potential is problematic because, out of necessity, to make the problems they address tractable, all studies of the code evolution are performed in formalized and, more or less, artificial settings (be it modeling under a defined set of code transformation or aptamer selection experiments) the relevance of which to the reality of primordial evolution is dubious at best. The hypothesis on the causal connection between the universality of the code and the collective character of primordial evolution characterized by extensive genetic exchange between ensembles of replicators (118) is attractive and appears conceptually important because it takes the study of code evolution from being a purely formal exercise into a broader and more biologically meaningful context. Nevertheless, this proposal, even if quite plausible, is only one facet of a much more general and difficult problem, perhaps, the most formidable problem of all evolutionary biology. Indeed, it stands to reason that any scenario of the code origin and evolution will remain vacuous if not combined with understanding of the origin of the coding principle itself and the translation system that embodies it. At the heart of this problem is a dreary vicious circle: what would be the selective force behind the evolution of the extremely complex translation system before there were functional proteins? And, of course, there could be no proteins without a sufficiently effective translation system. A variety of hypotheses have been proposed in attempts to break the circle (see (132135) and references therein) but so far none of these seems to be sufficiently coherent or enjoys sufficient support to claim the status of a real theory.

It seems that detailed modeling of the code evolution from simpler predecessors such as doublet codes could offer some new windows into the early stages of the evolution of coding (72). Notably, backtracking the standard code to the most likely doublet versions yields codes with an exceptional, nearly maximum error minimization capacity (ASN and EVK, unpublished), an observation that moves selection for error minimization and/or frozen accident at least one step closer to the actual origin of translation. Nevertheless, these and other theoretical approaches lack the ability to take the reconstruction of the evolutionary past beyond the complexity threshold that is required to yield functional proteins, and we must admit that concrete ways to cross that horizon are not currently known.

On the experimental front, findings on the catalytic capabilities of selected ribozymes are impressive (136). In particular, highly efficient self-aminoacylating ribozymes and ribozymes that catalyze the peptidyltransferase reaction have been obtained (137, 138). Moreover, ribozymes whose catalytic activity is stimulated by peptides have been selected (139), hinting at the possible origins of the RNA-protein connection (133). Nevertheless, in a close analogy to the situation with theoretical approaches, we are unaware of any experiments that would have the potential to actually reconstruct the origin of coding, not even at the stage of serious planning.

Summarizing the state of the art in the study of the code evolution, we cannot escape considerable skepticism. It seems that the two-pronged fundamental question: “why is the genetic code the way it is and how did it come to be?”, that was asked over 50 years ago, at the dawn of molecular biology, might remain pertinent even in another 50 years. Our consolation is that we cannot think of a more fundamental problem in biology.


Although the study of the evolution of the genetic code is a relatively well focused field, the literature accumulated over the 50 years of research is extensive, and we could not possibly cover all of it in a brief review article. Our sincere apologies to all colleagues whose relevant work is not cited due to space restrictions. EVK is grateful to Nigel Goldenfeld, Paul Higgs, and Claus Wilke for insightful discussions during the workshop on “Evolution: from Atoms to Organisms” at the Aspen Center for Physics (Aspen, CO), 8/10/2008-8/31/2008. The authors’ research is supported by the Department of Health and Human Services intramural program (NIH, National Library of Medicine).


1. Nirenberg MW, Jones W, Leder P, Clark BFC, Sly WS, Pestka S. On the coding of genetic information. Cold Spring Harb Symp Quant Biol. 1963;28:549–557.
2. Hinegardner RT, Engelberg J. Rationale for a Universal Genetic Code. Science. 1963;142:1083–1055. [PubMed]
3. Woese CR, Hinegardner RT, Engelberg J. Universality in the Genetic Code. Science. 1964;144:1030–1031. [PubMed]
4. Woese CR. Order in the Genetic Code. Proceedings of the National Academy of Sciences. 1965;54(1):71–75. [PubMed]
5. Woese CR. The Genetic Code: The Molecular Basis for Genetic Expression. Harper & Row 1967
6. Crick FH. The origin of the genetic code. J Mol Biol. 1968;38(3):367–79. [PubMed]
7. Ycas M. The Biological Code. Amsterdam: North-Holland; 1969.
8. Chechetkin VR. Block structure and stability of the genetic code. J Theor Biol. 2003;222(2):177–188. [PubMed]
9. Rumer IB. On codon systematization in the genetic code. Dokl Akad Nauk SSSR. 1966;167(6):1393–1394. [PubMed]
10. Vol’kenshtein MV, Rumer IB. Systematics of codons. Biofizika. 1967;12(1):10–13. [PubMed]
11. Wetzel R. Evolution of the aminoacyl-tRNA synthetases and the origin of the genetic code. Journal of Molecular Evolution. 1995;40(5):545–550. [PubMed]
12. Di Giulio M. The origin of the genetic code: theories and their relationships, a review. Biosystems. 2005;80(2):175–184. [PubMed]
13. Hasegawa M, Miyata T. On the Asymmetry of the Amino Acid Code Table. Orig Life. 1980;10:265–270. [PubMed]
14. King JL, Jukes TH. Non-Darwinian evolution. Science. 1969;164(881):788–798. [PubMed]
15. Gilis D, Massar S, Cerf NJ, Rooman M. Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol. 2001;2(11):49.1–49.12. [PMC free article] [PubMed]
16. Haig D, Hurst LD. A quantitative measure of error minimization in the genetic code. Journal of Molecular Evolution. 1991;33(5):412–417. [PubMed]
17. Freeland SJ, Wu T, Keulmann N. The case for an error minimizing standard genetic code. Orig Life Evol Biosph. 2003;33(4–5):457–477. [PubMed]
18. Itzkovitz S, Alon U. The genetic code is nearly optimal for allowing additional information within protein-coding sequences. Genome Research. 2007;17(4):405–412. [PubMed]
19. Ambrogelly A, Palioura S, Soll D. Natural expansion of the genetic code. Nat Chem Biol. 2007;3:29–35. [PubMed]
20. Aldana M, Cazarez-Bush F, Cocho G, Martnez-Mekler G. Primordial synthesis machines and the origin of the genetic code. Physica A. 1998;257(1):119–127.
21. Aldana-Gonzalez M, Cocho G, Larralde H, Martinez-Mekler G. Translocation Properties of Primitive Molecular Machines and Their Relevance to the Structure of the Genetic Code. Journal of Theoretical Biology. 2003;220(1):27–45. [PubMed]
22. Gusev VA, Schulze-Makuch D. Genetic code: Lucky chance or fundamental law of nature? Physics of Life Reviews. 2004;1(3):202–229.
23. Patel A. The triplet genetic code had a doublet predecessor. J Theor Biol. 2005;233(4):527–532. [PubMed]
24. Travers A. The evolution of the genetic code revisited. Orig Life Evol Biosph. 2006;36(5–6):549–555. [PubMed]
25. Ikehara K, Niihara Y. Origin and evolutionary process of the genetic code. Curr Med Chem. 2007;14(30):3221–3231. [PubMed]
26. Wu HL, Bagby S, van den Elsen JM. Evolution of the genetic triplet code via two types of doublet codons. J Mol Evol. 2005;61(1):54–64. [PubMed]
27. Szathmary E. Four Letters in the Genetic Alphabet: A Frozen Evolutionary Optimum? Proceedings: Biological Sciences. 1991;245(1313):91–99. [PubMed]
28. Szathmary E. Why are there four letters in the genetic alphabet? Nat Rev Genet. 2003;4(12):995–1001. [PubMed]
29. Weber AL, Miller SL. Reasons for the occurrence of the twenty coded protein amino acids. Journal of Molecular Evolution. 1981;17(5):273–284. [PubMed]
30. Lu Y, Freeland S. On the evolution of the standard amino-acid alphabet. Genome Biol. 2006;7(1):102. [PMC free article] [PubMed]
31. Lu Y, Freeland SJ. A quantitative investigation of the chemical space surrounding amino acid alphabet formation. J Theor Biol. 2008;250(2):349–361. [PubMed]
32. Munteanu A, Attolini CS, Rasmussen S, Ziock H, Sole RV. Generic Darwinian selection in catalytic protocell assemblies. Philos Trans R Soc Lond B Biol Sci. 2007;362(1486):1847–1855. [PMC free article] [PubMed]
33. Barrell BG, Bankier AT, Drouin J. A different genetic code in human mitochondria. Nature. 1979;282(5735):189–194. [PubMed]
34. Knight RD, Freeland SJ, Landweber LF. Selection, history and chemistry: the three faces of the genetic code. Trends Biochem Sci. 1999;24(6):241–247. [PubMed]
35. Knight RD, Freeland SJ, Landweber LF. Rewiring the keyboard: evolvability of the genetic code. Nat Rev Genet. 2001;2:49–58. [PubMed]
36. Yokobori S, Suzuki T, Watanabe K. Genetic Code Variations in Mitochondria: tRNA as a Major Determinant of Genetic Code Plasticity. Journal of Molecular Evolution. 2001;53(4):314–326. [PubMed]
37. Santos MAS, Moura G, Massey SE, Tuite MF. Driving change: the evolution of alternative genetic codes. Trends Genet. 2004;20(2):95–102. [PubMed]
38. Sengupta S, Yang X, Higgs PG. The Mechanisms of Codon Reassignments in Mitochondrial Genetic Codes. Journal of Molecular Evolution. 2007;64(6):662–688. [PMC free article] [PubMed]
39. Santos MAS, Cheesman C, Costa V, Moradas-Ferreira P, Tuite MF. Selective advantages created by codon ambiguity allowed for the evolution of an alternative genetic code in Candida spp. Molecular Microbiology. 1999;31(3):937–947. [PubMed]
40. Giege R, Sissler M, Florentz C. Universal rules and idiosyncratic features in tRNA identity. Nucleic Acids Research. 1998;26(22):5017–5035. [PMC free article] [PubMed]
41. Matsuyama S, Ueda T, Crain PF, McCloskey JA, Watanabe K. A novel wobble rule found in starfish mitochondria. Presence of 7-methylguanosine at the anticodon wobble position expands decoding capability of tRNA. Journal of Biological Chemistry. 1998;273(6):3363–3368. [PubMed]
42. Alfonzo JD, Blanc V, Estévez AM, Rubio MAT, Simpson L. C to U editing of the anticodon of imported mitochondrial tRNA Trp allows decoding of the UGA stop codon in Leishmania tarentolae. The EMBO Journal. 1999;18:7056–7062. [PubMed]
43. Allmang C, Krol A. Selenoprotein synthesis: UGA does not end the story. Biochimie. 2006;88(11):1561–1571. [PubMed]
44. Krzycki JA. The direct genetic encoding of pyrrolysine. Curr Opin Microbiol. 2005;8(6):706–712. [PubMed]
45. Wang L, Xie J, Schultz PG. Expanding the genetic code. Annu Rev Biophys Biomol Struct. 2006;35:225–249. [PubMed]
46. Xie J, Schultz PG. A chemical toolkit for proteins - an expanded genetic code. Nat Rev Mol Cell Biol. 2006;7(10):775–782. [PubMed]
47. Osawa S. Evolution of the genetic code. Oxford University Press; 1995.
48. Osawa S, Jukes TH, Watanabe K, Muto A. Recent evidence for evolution of the genetic code. Microbiology and Molecular Biology Reviews. 1992;56(1):229–264. [PMC free article] [PubMed]
49. Schultz DW, Yarus M. Transfer RNA mutation and the malleability of the genetic code. J Mol Biol. 1994;235(5):1377–1380. [PubMed]
50. Schultz DW, Yarus M. On malleability in the genetic code. J Mol Evol. 1996;42(5):597–601. [PubMed]
51. Chechetkin VR. Genetic code from tRNA point of view. J Theor Biol. 2006;242(4):922–934. [PubMed]
52. Suzuki T, Ueda T, Watanabe K. The “polysemous” codon — a codon with multiple amino acid assignment caused by dual specificity of tRNA identity. The EMBO Journal. 1997;16:1122–1134. [PubMed]
53. Andersson SG, Kurland CG. Genomic evolution drives the evolution of the translation system. Biochem Cell Biol. 1995;73(11–12):775–787. [PubMed]
54. Andersson SGE, Kurland CG. Reductive evolution of resident genomes. Trends Microbiol. 1998;6(7):263–268. [PubMed]
55. Massey SE, Moura G, Beltrao P, Almeida R, Garey JR, Tuite MF, Santos MAS. Comparative Evolutionary Genomics Unveils the Molecular Mechanism of Reassignment of the CTG Codon in Candida spp. Genome Research. 2003;13(4):544–557. [PubMed]
56. Andersson GE, Kurland CG. An extreme codon preference strategy: codon reassignment. Mol Biol Evol. 1991;8(4):530–544. [PubMed]
57. Massey SE, Garey JR. A Comparative Genomics Analysis of Codon Reassignments Reveals a Link with Mitochondrial Proteome Size and a Mechanism of Genetic Code Change Via Suppressor tRNAs. Journal of Molecular Evolution. 2007;64(4):399–410. [PubMed]
58. Woese CR, Fox GE. The concept of cellular evolution. J Mol Evol. 1977;10(1):1–6. [PubMed]
59. Gamow G. Possible relation between deoxyribonucleic acid and protein structures. Nature. 1954;173:318.
60. Pelc SR. Correlation between coding-triplets and amino acids. Nature. 1965;207(4997):597–599. [PubMed]
61. Pelc SR, Welton MGE. Stereochemical relationship between coding triplets and amino-acids. Nature. 1966;209(5026):868–870. [PubMed]
62. Dunnill P. Triplet nucleotide-amino-acid pairing; a stereochemical basis for the division between protein and non-protein amino-acids. Nature. 1966;210(5042):1267–1268. [PubMed]
63. Sonneborn TM. Degeneracy of the genetic code: extent, nature, and genetic implications. Evolving genes and proteins. 1965:377–397.
64. Epstein CJ. Role of the amino-acid “code” and of selection for conformation in the evolution of proteins. Nature. 1966;210(5031):25–28. [PubMed]
65. Woese CR. On the evolution of the genetic code. Proc Natl Acad Sci USA. 1965;54(6):1546–1552. [PubMed]
66. Goldberg AL, Wittes RE. Genetic Code: Aspects of Organization. Science. 1966;153(3734):420. [PubMed]
67. Davies J, Gilbert W, Gorini L. Streptomycin, Suppression, and the Code. Proceedings of the National Academy of Sciences. 1964;51(5):883–890. [PubMed]
68. Friedman SM, Weinstein IB. Lack of fidelity in the translation of ribopolynucleotides. Proc Natl Acad Sci USA. 1964;52:988–996. [PubMed]
69. Alff-Steinberger C. The Genetic Code and Error Transmission. Proceedings of the National Academy of Sciences. 1969;64(2):584–591. [PubMed]
70. Wong JTF. A Co-Evolution Theory of the Genetic Code. Proceedings of the National Academy of Sciences. 1975;72(5):1909–1912. [PubMed]
71. Wong JTF. Coevolution theory of the genetic code at age thirty. BioEssays. 2005;27(4):416–425. [PubMed]
72. Delarue M. An asymmetric underlying rule in the assignment of codons: possible clue to a quick early evolution of the genetic code via successive binary choices. RNA. 2007;13(2):161–169. [PubMed]
73. Woese CR, Dugre DH, Dugre SA, Kondo M, Saxinger WC. On the fundamental nature and evolution of the genetic code. Cold Spring Harb Symp Quant Biol. 1966;31:723–736. [PubMed]
74. Saxinger C, Ponnamperuma C, Woese C. Evidence for the interaction of nucleotides with immobilized amino-acids and its significance for the origin of the genetic code. Nat New Biol. 1971;234(49):172–174. [PubMed]
75. Yarus M, Caporaso JG, Knight R. Origins of the Genetic Code: The Escaped Triplet Theory. Annual Review of Biochemistry. 2005;74:179–198. [PubMed]
76. Knight RD, Landweber LF. Rhyme or reason: RNA-arginine interactions and the genetic code. Chem Biol. 1998;5(9):215–220. [PubMed]
77. Knight RD, Landweber LF, Yarus M. Tests of a Stereochemical Genetic Code. In: Lapointe J, Brakier-Gingras L, editors. Translation Mechanism. Kluwer Academic/Plenum Publishers; New York: 2003. pp. 115–128.
78. Ellington AD, Khrapov M, Shaw CA. The scene of a frozen accident. RNA. 2000;6(04):485–498. [PubMed]
79. Knight RD, Landweber LF. Guilt by association: The arginine case revisited. RNA. 2000;6(04):499–510. [PubMed]
80. Woese CR, Dugre DH, Saxinger WC, Dugre SA. The Molecular Basis for the Genetic Code. Proceedings of the National Academy of Sciences. 1966;55(4):966–974. [PubMed]
81. Freeland SJ. The Genetic Code Is One in a Million. Journal of Molecular Evolution. 1998;47(3):238–248. [PubMed]
82. Freeland SJ, Knight RD, Landweber LF, Hurst LD. Early Fixation of an Optimal Genetic Code. Molecular Biology and Evolution. 2000;17:511–518. [PubMed]
83. Goodarzi H, Nejad HA, Torabi N. On the optimality of the genetic code, with the consideration of termination codons. Biosystems. 2004;77(1–3):163–173. [PubMed]
84. Zhu CT, Zeng XB, Huang WD. Codon Usage Decreases the Error Minimization Within the Genetic Code. Journal of Molecular Evolution. 2003;57(5):533–537. [PubMed]
85. Archetti M. Codon Usage Bias and Mutation Constraints Reduce the Level of Error Minimization of the Genetic Code. Journal of Molecular Evolution. 2004;59(2):258–266. [PubMed]
86. Novozhilov AS, Wolf YI, Koonin EV. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biology Direct. 2007;2(24) [PMC free article] [PubMed]
87. Parker J. Errors and alternatives in reading the universal genetic code. Microbiology and Molecular Biology Reviews. 1989;53(3):273–298. [PMC free article] [PubMed]
88. Kramer EB, Farabaugh PJ. The frequency of translational misreading errors in E. coli is largely determined by tRNA competition. RNA. 2007;13(1):87–96. [PubMed]
89. Ardell DH. On error minimization in a sequential origin of the standard genetic code. J Mol Evol. 1998;47(1):1–13. [PubMed]
90. Ardell DH, Sella G. No accident: genetic codes freeze in error-correcting patterns of the standard genetic code. Philos Trans R Soc Lond B Biol Sci. 2002;357(1427):1625–1642. [PMC free article] [PubMed]
91. Sella G, Ardell DH. The coevolution of genes and genetic codes: Crick’s frozen accident revisited. J Mol Evol. 2006;63(3):297–313. [PubMed]
92. Collins DW, Jukes TH. Rates of transition and transversion in coding sequences since the human-rodent divergence. Genomics. 1994;20(3):386–396. [PubMed]
93. Kumar S. Patterns of Nucleotide Substitution in Mitochondrial Protein Coding Genes of Vertebrates. Genetics. 1996;143(1):537–548. [PubMed]
94. Di Giulio M. The Origin of the Genetic Code cannot be Studied using Measurements based on the PAM Matrix because this Matrix Reflects the Code Itself, Making any such Analyses Tautologous. Journal of Theoretical Biology. 2001;208(2):141–144. [PubMed]
95. Di Giulio M. The origin of the genetic code. Trends in Biochemical Sciences. 2000;25(2):44. [PubMed]
96. Di Giulio M, Capobianco MR, Medugno M. On the optimization of the physicochemical distances between amino acids in the evolution of the genetic code. J Theor Biol. 1994;168(1):43–51. [PubMed]
97. Stoltzfus A, Yampolsky LY. Amino Acid Exchangeability and the Adaptive Code Hypothesis. J Mol Evol. 2007;65(4):456–462. [PubMed]
98. Trifonov EN. Consensus temporal order of amino acids and evolution of the triplet code. Gene. 2000;261(1):139–151. [PubMed]
99. Trifonov EN. The triplet code from first principles. J Biomol Struct Dyn. 2004;22(1):1–11. [PubMed]
100. Wong JTF. Role of Minimization of Chemical Distances between Amino Acids in the Evolution of the Genetic Code. Proceedings of the National Academy of Sciences. 1980;77(2):1083–1086. [PubMed]
101. Di Giulio M. The extension reached by the minimization of the polarity distances during the evolution of the genetic code. Journal of Molecular Evolution. 1989;29(4):288–293. [PubMed]
102. Di Giulio M. The coevolution theory of the origin of the genetic code. Physics of Life Reviews. 2004;1(2):128–137.
103. Wong JTF. Question 6: Coevolution Theory of the Genetic Code: A Proven Theory. Origins of Life and Evolution of Biospheres. 2007;37(4):403–408. [PubMed]
104. Wong JTF, Bronskill PM. Inadequacy of prebiotic synthesis as origin of proteinous amino acids. Journal of Molecular Evolution. 1979;13(2):115–125. [PubMed]
105. Wong JTF. Coevolution of genetic code and amino acid biosynthesis. Trends Biochem Sci. 1981;6:33–35.
106. Kobayashi K, Tsuchiya M, Oshima T, Yanagawa H. Abiotic synthesis of amino acids and imidazole by proton irradiation of simulated primitive earth atmospheres. Origins of Life and Evolution of the Biosphere. 1990;20(2):99–109.
107. Wong JTF. Membership Mutation of the Genetic Code: Loss of Fitness by Tryptophan. Proceedings of the National Academy of Sciences. 1983;80(20):6303–6306. [PubMed]
108. Amirnovin R. An Analysis of the Metabolic Theory of the Origin of the Genetic Code. Journal of Molecular Evolution. 1997;44(5):473–476. [PubMed]
109. Ronneberg TA, Landweber LF, Freeland SJ. Testing a biosynthetic theory of the genetic code: Fact or artifact? Proceedings of the National Academy of Sciences. 2000;97(25):13690–13695. [PubMed]
110. Di Giulio M. A Blind Empiricism Against the Coevolution Theory of the Origin of the Genetic Code. Journal of Molecular Evolution. 2001;53(6):724–732. [PubMed]
111. Freeland SJ, Hurst LD. Load minimization of the genetic code: history does not explain the pattern. Proceedings of the Royal Society B: Biological Sciences. 1998;265(1410):2111–2119.
112. Caporaso JG, Yarus M, Knight R. Error Minimization and Coding Triplet/Binding Site Associations Are Independent Features of the Canonical Genetic Code. Journal of Molecular Evolution. 2005;61(5):597–607. [PubMed]
113. Maeshiro T, Kimura M. The role of robustness and changeability on the origin and evolution of genetic codes. Proceedings of the National Academy of Sciences. 1998;95(9):5088–5093. [PubMed]
114. Judson OP. The Genetic Code: What Is It Good For? An Analysis of the Effects of Selection Pressures on Genetic Codes. Journal of Molecular Evolution. 1999;49(5):539–550. [PubMed]
115. Zhu W, Freeland S. The standard genetic code enhances adaptive evolution of proteins. J Theor Biol. 2005;239(1):63–70. [PubMed]
116. Koonin EV. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol. 2003;1(2):127–136. [PubMed]
117. Harris JK, Kelley ST, Spiegelman GB, Pace NR. The genetic core of the universal ancestor. Genome Res. 2003;13(3):407–412. [PubMed]
118. Vetsigian K, Woese C, Goldenfeld N. Collective evolution and the genetic code. Proceedings of the National Academy of Sciences. 2006;103(28):10696–10701. [PubMed]
119. Goldenfeld N, Woese C. Connections Biology’s next revolution. Nature. 2007;445:369. [PubMed]
120. Haldane JBS. The Origin of Life. Rationalist Annual. 1928;148:3–10.
121. Anderson NG. Evolutionary significance of virus infection. Nature. 1970;227(5265):1346–1347. [PubMed]
122. Syvanen M. Cross-species gene transfer; implications for a new theory of evolution. J Theor Biol. 1985;112(2):333–343. [PubMed]
123. Syvanen M. Recent emergence of the modern genetic code: a proposal. Trends Genetics. 2002;18:245–248. [PubMed]
124. Woese CR. Interpreting the universal phylogenetic tree. Proceedings of the National Academy of Sciences. 2000;97(15):8392–8396. [PubMed]
125. Koonin EV, Martin W. On the origin of genomes and cells within inorganic compartments. Trends Genet. 2005;21(12):647–654. [PubMed]
126. Martin W, Russell MJ. On the origins of cells: a hypothesis for the evolutionary transitions from abiotic geochemistry to chemoautotrophic prokaryotes, and from prokaryotes to nucleated cells. Philos Trans R Soc Lond B Biol Sci. 2003;358(1429):59–83. [PMC free article] [PubMed]
127. Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. Why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A. 2005;102(40):14338–14343. [PubMed]
128. Drummond DA, Raval A, Wilke CO. A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol. 2006;23(2):327–337. [PubMed]
129. Wilke CO, Drummond DA. Population genetics of translational robustness. Genetics. 2006;173(1):473–481. [PubMed]
130. Drummond DA, Wilke CO. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008;134(2):341–352. [PMC free article] [PubMed]
131. Zintzaras E, Santos M, Szathmary E. “Living” under the challenge of information decay: the stochastic corrector model vs. hypercycles. J Theor Biol. 2002;217(2):167–181. [PubMed]
132. Penny D. An interpretative review of the origin of life research. Biology and Philosophy. 2005;20(4):633–671.
133. Wolf YI, Koonin EV. On the origin of the translation system and the genetic code in the RNA world by means of natural selection, exaptation, and subfunctionalization. Biology Direct. 2007;2(14) [PMC free article] [PubMed]
134. Noller HF. Evolution of ribosomes and translation from an RNA world. In: Gesteland RF, Cech TR, Atkins JF, editors. The RNA World. Cold Spring Harbor, NY: Cold Spring Harbor laboratory press; 2006.
135. Noller HF. The driving force for molecular evolution of translation. RNA. 2004;10(12):1833–1837. [PubMed]
136. Fedor MJ, Williamson JR. The catalytic diversity of RNAs. Nat Rev Mol Cell Biol. 2005;6(5):399–412. [PubMed]
137. Cui Z, Sun L, Zhang B. A peptidyl transferase ribozyme capable of combinatorial peptide synthesis. Bioorg Med Chem. 2004;12(5):927–933. [PubMed]
138. Illangasekare M, Kovalchuke O, Yarus M. Essential structures of a self-aminoacylating RNA. J Mol Biol. 1997;274(4):519–529. [PubMed]
139. Robertson MP, Knudsen SM, Ellington AD. In vitro selection of ribozymes dependent on peptides for activity. RNA. 2004;10(1):114–127. [PubMed]
140. Dueck G. New optimization heuristics: the great deluge algorithm and the record-to-record travel. Journal of Computional Physics. 1993;104:86–92.