The standard genetic code consists of 61 sense codons and 3 stop codons, coding for 20 amino acids and a termination signal. The molecules that convey this mapping of translation are several types of transfer RNA (tRNA). To one side of the tRNA, an amino acid is attached by a designated aminoacyl tRNA synthetase (ARS). This coupling of each amino acid by ARS to the appropriate tRNA molecule is a pivotal part for the mapping of the genetic code (
Ibba and Söll, 2000). Located on the other side of the tRNA is the anticodon that binds to specific codons of the messenger RNA (mRNA) at the A-site of the ribosome, ensuring that the correct amino acid is decoded. There are fewer tRNAs than codons, as non-standard base paring allows one tRNA molecule to read multiple codons, thereby reducing the number of tRNAs necessary for reading all codons. Depending on the organism, the number of different tRNA isoacceptors range from 23 (allowing for a fully degenerate binding at the third codon position) to 45 (allowing only degenerate binding of pyrimidines). This anticodon–codon mapping of the tRNAs to the mRNA is the focus of this study (see ). We first investigate the imprint of the decoding properties of tRNA in codon usage bias, then we exploit the signal to infer the codon reading in yeast.
The codon usage bias is the non-random, unequal usage of synonymous codons. It is shaped by the balance of mutational bias and selection (
Bulmer, 1991). The mutational biases originating from DNA processes (replication and repair errors, etc.) generates a variation in the sequences. These sequence variations are under various selective pressures. A prominent such pattern of selection is the co-evolution of tRNA abundance and codon usage bias (
Dong et al., 1996), Many of the different patterns are induced by the physiological constraints of translational selection (
Plotkin and Kudla, 2011). The relative strength and importance of all factors that influence the codon usage bias has not yet been resolved. Given the central role of tRNA in translation, it is not inconceivable that a signal caused by the decoding properties of tRNA is detectable by analysis of codon usage bias.
The decoding properties of tRNA are governed by several factors. In essence, the ribosome constrains the first and the second position of the codon to strict cognate (Watson–Crick) binding, but monitors the third position of the codon less stringently (
Ogle and Ramakrishnan, 2005). The third position of the codon and the first position of the anticodon are thus called the wobble position. Driven by the necessity for stable protein synthesis, the nucleotides at the wobble position of the tRNA are often chemically modified to alter the specificity of the binding (
Agris et al.,, 2007). The cell makes a comparatively large investment into genes for tRNA modifications, to maintain tRNA stability, aid recognition for the corresponding ARS and to ensure accurate codon reading (
de Crécy-Lagard, 2007).
The assignment of codon reading is important for several methods in sequence analysis. Some indices of codon usage depend on knowledge of specific anticodon–codon mapping [e.g. tAI (
Friberg et al., 2006) and TPI (dos
Reis et al., 2003)]. The mapping may help to understand the evolution of the genetic code and to find potential targets for re-engineering the genetic code to incorporate non-natural amino acids into proteins (
Moura et al., 2010). The most conclusive way of determining the binding characteristics (i.e. accuracy and efficacy) is via experiments (
Björk and Hagervall, 2005;
Johansson et al., 2008). Unfortunately, experimental determination of the tRNA properties is expensive and time-consuming. Therefore, the ability to predict the decoding properties of tRNA is a highly desirable alternative to experimental assignment.
Using knowledge of nucleotide chemistry of base pairing, Francis Crick proposed a scheme for the decoding properties of tRNA (
Crick, 1966). These are the wobble rules, which still remain a common description for the anticodon–codon mapping. The original version of the wobble rules can briefly be described as: A:U, C:G, G:{C, U}, U:{A, G}, and in the case of a adenine modified to inosine: I:{A, C, U}. The latter being the only known modification when the wobble rules were devised. This notation is understood in the following way: the base at the first position of the anticodon is before the colon while the set of bases at the third position of the codon that can bind with the first position of the anticodon is after the colon. For example, U:{A, G} means that a uracil at the wobble position of the tRNA can bind either an adenosine or guanine. The original wobble rules are summarized in .
Over time, as new tRNA modifications were discovered, several cases were found in which the original wobble rules failed to describe the true codon reading
Agris (1991);
Agris et al., (2007);
Guthrie and Abelson (1982);
Yokoyama and Nishimura (1995). In response, the original wobb rules were updated several times to accommodate the newly discovered codon readings. For example, in the restricted version of the wobble rules for eukaryote, the I:A wobble is not allowed (
Guthrie and Abelson, 1982). Observed possibilities of codon reading are summarized in . It turns out that finding simple ‘rules’ for codon reading is not trivial and maybe impossible. To date, there is no generalization of the wobble rules, even when the base modifications are known.
A particular method extending the wobble rules worth mentioning is the wobble parsimony method (
Percudani et al., 1997;
Percudani, 2001). This method infers the codon reading assignment in eukaryotes from genomic data. It uses the wobble rules and adds the knowledge of the presence of tRNA isoacceptors for a given organism, information that can be extracted rather easily from the complete genome (
Lowe and Eddy, 1997). Wobble parsimony assumes that a codon with a cognate tRNA present, has only canonical decoding. Extended codon reading by a tRNA is assumed if a codon does not have a cognate tRNA. The rules that specify the wobble parsimony in eukaryotes are (i) codons with cognate tRNAs are assigned canonical decoding, (ii) codons without cognate tRNA are assigned following the principle of restricted wobbling (G:U, A:C) and (iii) extended pairings (A:A, U:G) are assumed for codons that remain unassigned. Wobble parsimony is only applicable for eukaryotes and not bacteria, where the tRNAs commonly recognize more codons than eukaryotes.
Another sequence-based approach to infer trends in codon reading of tRNA is based on the analysis of tRNA genes and tRNA-modifying enzymes across species (
Grosjean et al., 2010). Four major decoding strategies over the three kingdoms of life have been identified by the absence/presence patterns of tRNAs and genes that are involved in specific nucleotide modifications. It is assumed that the proteins for nucleotide modifications have evolved to optimize specificity and efficiency of translation. There are several exceptions for the standard modes of decoding, in particular for uracil modifications where it is difficult to distinguish the codon reading. This is a limitation of the method. For example, in a four-codon box, there is no need to prevent the U*:U reading, although it is not clear if this reading is viable or not.
In this study, we use a novel approach to detect the imprint of tRNA-decoding properties in the codon usage bias and to infer the codon reading of tRNA. We propose a Hidden Markov Model (HMM) method and two auxiliary methods: regression (REG) and codon correlation (CC). The results show that the best predictions rank high in the distribution of all solutions. Also, the three methods generally agree on the prediction. The predictions of the methods outperform the wobble rules on experimentally verified codon readings.