Translation is the process by which ribosomes synthesize proteins in cells. Protein synthesis is essential to all organisms, and cells expend a large amount of energy and time on translation. For single celled organisms, such as bacteria, there is a direct relationship between the rates of cellular processes such as translation and the rate of cell growth and cell division. Therefore, improvements in translation should increase the fitness of the organism. The term ‘translational selection’ refers to selection to optimize the translation process itself rather than selection acting on the functions of the proteins produced by translation. One of the main pieces of evidence for translational selection is the observation that the choice of synonymous codons appears to be influenced by selection in many organisms. Synonymous changes in the gene do not affect the resulting protein but can affect the way that the mRNA is translated by the ribosome.
The speed of translation is one of the key factors on which translational selection can act. Speed has the direct benefit that the proteins required are produced faster, and the secondary benefit that if a given ribosome finishes translation of one sequence, it can begin work on another. Hence, speeding up translation means that the same total protein production rate can be achieved with fewer ribosomes. Synthesis of the ribosomal proteins and RNAs themselves is costly to the cell, so getting the most out of a limited number of ribosomes is important for efficiency. The argument for translational speed/efficiency explains the observation that codon usage is most strongly biased in a relatively small number of genes that are highly expressed in conditions of rapid growth
[1]. In
E. coli, the concentrations of tRNAs are also found to vary with growth conditions
[2] and are found to correlate with the frequencies of codons in highly expressed genes.
Ribosomal proteins and translational elongation factors are among the most highly expressed genes in bacteria, and are easily recognizable conserved genes in most genomes. These genes are often used as a reference set, and the codon frequencies in these reference genes are used to define measures of codon bias with which to compare the strength of translational selection in different genes. The first of these is the codon adaptation index (CAI), introduced by Sharp and Li
[3]. However, codon frequencies can also vary due to mutational biases as well as because of selection. More recent work has used population genetics theory to predict the way that codon frequencies should vary under both mutation and selection, and hence to develop measures of codon bias that distinguish the strength of selection from the underlying mutational bias
[4],
[5],
[6],
[7]. These methods look at the difference in codon frequencies between high and low expression genes, rather than simply at the frequencies in the high expression genes. Another measure of translational selection is the tRNA adaptation index (tAI) that weights codons according to how well they match the pool of tRNA genes
[8]. However, to do this accurately requires knowledge of the relative rates of pairing of different anticodon-codon combinations, and our own studies
[7] have shown that this is a complex issue that goes beyond the simple wobble rules.
Further evidence for the importance of translational speed in bacteria is the observation that codon bias is strongest in organisms that have fast growth rates
[9]. These same fast-growing organisms are also found to have larger numbers of duplicated copies of tRNA genes
[9] and larger numbers of copies of ribosomal RNA operons
[5]. Our interpretation is that rapid growth requires rapid translation and hence a high rate of production of rRNAs and tRNAs. This is facilitated by duplication of the RNA genes. There is direct experimental evidence that when mixtures of bacteria are grown in culture together, the colonies that appear most rapidly are those which have the largest number of rRNA operons
[10]; thus, having duplicated rRNAs allows a rapid growth response in conditions where food is plentiful. We have shown that selection for translational efficiency can favour genomes with increased numbers of tRNAs and can lead to coevolution of tRNA content and codon usage
[6],
[7].
Bacterial genomes usually do not have large non-coding regions and, in general, duplicated genes are rare. This suggests that the efficiency of DNA replication is also important to bacteria and this keeps genomes from becoming larger than necessary. The fact that tRNA and rRNA genes are often duplicated attests to the importance of these genes. It is interesting to note that ribosomal proteins, which are required in cells in equally high numbers as ribosomal RNAs, usually have single-copy genes. High levels of proteins can be achieved by optimizing translation from a limited number of mRNAs, whereas high levels of rRNAs and tRNAs can only be achieved by duplicating the genes, and hence increasing transcription.
The other important aspect of translational selection is accuracy. Occasional mis-pairings between codon and anticodon may occur during translation, leading to errors in the protein sequence. This is wasteful, if the protein is no longer functional, and may actually be harmful, if mistranslated proteins misfold to structures that are toxic, as has been suggested
[11]. If errors in translation are sufficiently frequent and sufficiently harmful, and if the error rate differs among synonymous codons, then selection may chose codons that have the lowest error rate. A signature of selection for accuracy is that codon frequencies differ between conserved and variable sites within the same genes
[12],
[13]. It is presumed that sites that are evolutionarily conserved between species are particularly important for protein function. Thus, accurate translation of these sites should be particularly important, and the frequency of the most accurate codons should be higher at the conserved sites.
There seems to be clear evidence that both the speed and accuracy of translation can differ between synonymous codons. We have previously discussed many of the specific details of codon-anticodon interactions that influence which codons are preferred as a function of which tRNAs are present in an organism
[7]. We also reviewed the way in which modified bases on the tRNA influence translational speed and the ability of tRNAs to distinguish between correct and incorrect codons. Our theoretical interpretation of the codon frequency data
[6],
[7] has been primarily in terms of selection for speed; however, given the evidence that accuracy is also important, it is of interest to look for evidence of codon bias due to selection for both accuracy and speed in the same gene sequences and the same organisms. In this paper we will develop a statistical test to detect differences in codon frequencies between any two sets of codons, and to measure the extent of these differences. We will apply this test to the comparison of codon frequencies in high and low expression genes, and to the comparison of codon frequencies in conserved and variable sites within high expression genes. By comparing these factors in the same set of organisms, we are able to make a useful comparison of the two main causes of translational selection across many species.