Diversity of tRNA isodecoder genes
tRNAScan-SE is among the most successful programs to identify non-coding RNAs in genome sequences (9
). tRNAScan-SE first utilizes tRNAscan1.4 and EufindtRNA to search for conserved tRNA sequences at defined positions, then evaluates co-variance in conserved tRNA sequence and secondary structure. The algorithm is capable of detecting 99–100% of tRNAs with a very low error rate (one false positive per 15 GB).
Eukaryotic genomes generally contain several hundred tRNA genes as predicted by tRNAScan-SE (http://lowelab.ucsc.edu/GtRNAdb/
). tRNAScan-SE is also capable of distinguishing tRNA pseudogenes, which range from ~170 in the human genome to ~22
000 in the mouse genome. There are, however, significant outliers. Danio rerio
(zebra fish) has ~6000 predicted tRNA genes. Canis familiaris
(dog) contains ~400 genes for tRNALys
with anticodon CTT, which are excluded from our analysis to avoid unnecessary bias.
We chose to focus on tRNA sequences from 11 eukaryotic genomes as they represent a wide range in the phylogenetic tree and encompass many model organisms ( and Table S1). These 11 genomes have predicted tRNA gene counts from 171 (fission yeast) to 568 (worm). The number of tRNA isoacceptors among these 11 species range from 41 (budding yeast) to 55 (chimp).
Figure 1 tRNA genes and isodecoder genes in 11 eukaryotes. (A) Cladogram of the organisms based on the NCBI taxonomy browser (40,41) which include two single cell yeast, worm, fruit fly, fugu, chicken and five mammals, dog, rat, mouse, chimp and human. The fraction (more ...)
What is remarkable and not predicted before genome sequencing, however, are the numbers of tRNA genes having the same anticodon sequence but differences elsewhere in the tRNA body (). We tentatively use the nomenclature of ‘tRNA isodecoder gene’ to describe these tRNA sequences. tRNA isodecoders have the same anticodon sequence (hence they decode the same codon), i.e. they belong to the same isoacceptor class, but have sequence differences elsewhere in the tRNA body. One tRNA sequence within each isoacceptor class, generally the one with the highest gene copy number, is arbitrarily designated as the majority member. The number of tRNA isodecoder genes within an isoacceptor class is the count of distinct tRNA sequences within this class excluding the majority member. For example, the human tRNAArg(ACG) isoacceptor class has two different sequences with four and three gene copies. They differ by a single nucleotide at position #50. The four gene-copy tRNAArg(ACG) is the majority member and has the sequence of U50, and the three gene-copy tRNAArg(ACG) is classified as the isodecoder gene and has the sequence of C50. The number of tRNA isodecoder genes is therefore one for the tRNAArg(ACG) isoacceptor class. By this account, the total number of different tRNA gene sequences in these 11 genomes is the number of isoacceptors (i.e. from 41 to 55) plus the number of isodecoders (i.e. from 10 to 246).
The fraction of tRNA isodecoder genes (the sum of all isodecoder genes divided by the total number of tRNA genes) has distinct groupings among these 11 species when plotted on a cladogram (). This fraction is <10% in the budding and fission yeast, 12–18% in fruit fly and worm, and increases to 35–46% in fugu, chicken, dog, rat and mouse. The fraction is highest among the two primates where >50% of tRNA genes are isodecoder genes. This phylogenetic grouping indicates that the diversity of tRNA isodecoder genes cannot be simply derived from inaccuracies in genome sequencing (a small number of them may be attributed to lower sequencing accuracy in some genomes). The fraction of tRNA isodecoder genes corresponding to the phylogenic grouping of these organisms may suggest that they perform some kind of heretofore under-appreciated functions. It may also be a result of genome expansion.
We analyzed the sequence features of tRNA isodecoder genes further in six commonly studied species: budding yeast, worm, fruit fly, mouse, chimp and human ( and ). The number of tRNA isoacceptors range from 41 to 55. These isoacceptors occur between 1 and 60 times in the genome (). Budding yeast and fruit fly have relatively few tRNA genes (270–290) and the number of occurrences for each gene is relatively low. Worm has a high number of tRNA genes (568) and the number of occurrences is broadly distributed. A few isoacceptors in mammals have high copy numbers that distinguish them from the other isoacceptors.
Figure 2 Gene copy numbers of tRNA isoacceptors versus the number of occurrence or the number of isodecoders. (A) Plot of the gene copy number of tRNA isoacceptors and the number of occurrence for each isoacceptor class. (B) Plot of the gene copy number of tRNA (more ...)
Figure 3 Comparative sequence analysis of the tRNASer(AGA) isoacceptor family across six species. S.c.: budding yeast; C.e.: worm; D.m.: fruit fly; M.m.: mouse; P.t.: chimpanzee; H.s.: human. ‘-Nx’ indicates the gene copy number. For the non-mammalian (more ...)
The number of tRNA isodecoder genes varies from very low (10 in yeast) to very high (225–246) in chimp and human. The number of tRNA isodecoder genes in mammals has a good linear correlation (R-value of 0.89–0.92 and slope of 0.44–0.66) to the gene copy number of their corresponding isoacceptors (). The highest slope possible in this plot would be 1.0 when every tRNA gene is unique, after subtracting the majority member of isoacceptor classes. A slope of 0.64 shows that the bulk of human and chimp tRNA genes is unique. As for the non-mammal species, a linear correlation has significantly lower R-values (0.24–0.64) and smaller slopes (0.02–0.13). This result suggests that the evolutionary appearance of tRNA isodecoder genes in non-mammals may be less directed than in mammals.
The same description of tRNA isodecoder genes can be applied to bacterial tRNA genes in species with sequenced genomes (Supplementary Table S2 and Supplementary Figure S1). As described in the Introduction, the number of tRNA isodecoder genes in E.coli K12 is 6 among 86 genes (6/86 = 7%). Among the 139 bacterial genome sequences, the number of isodecoder genes range from 0 to 26 and the fraction of tRNA isodecoder genes range from 0 to 0.30. A great majority of species cluster in the lower regime of the tRNA gene-isodecoder gene plot (Supplementary Figure S1).
An in-depth sequence analysis of the tRNASer(AGA) isoacceptor class among the six eukaryotic organisms is shown in . This isoacceptor is chosen on the basis of simplicity of comparison as well as the number of isodecoder genes in each species. This tRNA isoacceptor has 11 gene copies in yeast, 15 copies in worm, 8 copies in fruit fly and mouse, and 9 copies in chimp and human. The yeast genes have 2 sequence variants, 10 being the same plus 1 distinct isodecoder gene. Worm has 3 sequence variants, 13 being the same plus 2 isodecoders. Fruit fly, mouse and chimp have two isodecoders each and human has three isodecoders. The yeast tRNA sequences are noticeably different from all others. All tRNA genes in worm and fruit fly are clustered together among themselves. The mammalian sequences cluster more closely according to their isodecoder genes than to their species. In fact, the majority sequence (with six copies each) of these mammalian species is identical.
Most sequence changes in the tRNASer(AGA) isodecoder genes do not alter the secondary or tertiary structure of tRNA. The fruit fly isodecoder genes involve an A–U to G–U pair change in the acceptor stem and C-to-U in the variable region of the D-loop. Sequence change in one worm isodecoder gene is an A–U to G–U in the stem of the long variable loop. The major sequence changes in the mammalian isodecoder genes involve A–U to G–U or G–U to G–C changes in various stems or a U-to-C change in an unpaired region in the variable loop. The two exceptions are a worm isodecoder gene (Ce2), which changes an A–U to C*U mismatch in the acceptor stem, and a mouse isodecoder gene (Mm2), which changes a G–C to A*C in the TΨC stem.
Human tRNA isodecoder genes
We further analyzed the locations of sequence changes in human tRNA isodecoder genes in detail (). Eukaryotic tRNA genes are transcribed by RNA polymerase III, and a portion of the Pol-III promoter is within a tRNA gene (22
). The internal promoters constitute two discrete regions corresponding to nt 8–19 (box A) and 52–62 (box B) of a tRNA. Nucleotides 8, 14, 18 and 19 in box A, and nt 53–56, 58 and 61 in box B are highly conserved among all tRNAs because of tRNA tertiary structure. Hence, only ~7 nt within box A and 6 nt within box B are variable. Human tRNA genes vary at 6.4% of these variable nt in box A and 12.3% in box B (). These sequence differences may lead to differential tRNA expression in human tissues or developmental stages.
Figure 4 Frequency of human isodecoder gene variations. Percentages indicate observed changes in each region divided by the total number of nucleotides assessed in that region. (A) Percent sequence variations in the A and B boxes which correspond to internal promoter (more ...)
Sequence changes in tRNA isodecoder genes can also be divided into nine regions according to tRNA secondary structure (). The number of sequence changes is determined by comparison of isodecoders to the majority variant. Frequency of sequence changes is the number of changes in each of the nine regions divided by the total number of nucleotides surveyed in that region. The largest frequency of sequence changes is among the three non-conserved residues in the TΨC loop: among the 675 nt at these positions, 104 nt are counted as different in isodecoders. Therefore, 15.4% of these three nucleotides vary. The next most variable region is the D-loop (10.8%) among positions 15, 16–17 (variable from 1 to 3 nt) and 20 (variable from 1 to 3 nt). These high frequency regions overlap with the A and B boxes that constitute the internal promoters for Pol-III transcription. Sequence changes in the stems are between 2.6 and 9.4%. More than four-fifths of sequence changes in the stems follow the rules of Watson–Crick base pairing and G–U wobble. Of the remaining one-fifth of sequence changes that disrupt Watson–Crick or G–U pairing, 42% (30/72) are A–C pairs. A–C pairs in RNA helices have pKa
values of 6.0–6.5, and protonated A–C pairs are structurally analogous to and as stable as G–U pairs (24
). A–C pairs in tRNA stems have been found to be functional in some bacterial tRNAs (26
). The function of tRNA isodecoders containing A–C pairs may depend on local pH which can vary among subcellular environments.
An experimental method to distinguish tRNA isodecoders
Experimental methods used to analyze the expression of RNA transcripts are generally based on hybridization differences of complementary oligonucleotide probes, primer extension using a mixture of deoxy and dideoxynucleotide triphosphates, and RT–PCR using primers that allow differential extension by reverse transcriptase. These methods work well when the RNA transcript is not very structured and post-transcriptional modifications do not impede hybridization or extension by the reverse transcriptase. Using purified tRNA mixture from HeLa cells, we tried to measure the expression of tRNA isodecoders by (i) differential hybridization of complementary oligonucleotides followed by RNase H cleavage; (ii) primer extension with up to 3× deoxynucleotide trisphosphates and 1× dideoxynucleotide trisphosphate; (iii) RT–PCR using primers with different 3′ terminal nucleotides. Although HeLa tRNAs can sometimes be detected by at least one of these methods, the result was either poorly reproducible or had very low sensitivity (data not shown). The primary problems of using these standard methods for eukaryotic tRNA appear to be derived from the extensive tRNA structure and the presence of tRNA modifications that interfere with hybridization/primer extension.
We devised a systematic method to distinguish tRNA isodecoder products that differ by a single nucleotide. The method is based on enzymatic ligation of two oligonucleotides using tRNA as template (), similar to those described for the analysis of mRNA transcripts (27
). In order to quantify the relative amount of two tRNA isodecoder products, two different types of oligonucleotide pairs are needed as ligation substrates. The first pair is only efficiently ligated using one of the two tRNA isodecoder templates. This pair of oligonucleotides is designated as discriminating (D-oligo), and there are two different D-oligos for each tRNA isodecoder pair. The second oligonucleotide pair is efficiently ligated for both tRNA isodecoders. This type is designated as non-discriminating (N-oligo), and there is one N-oligo for each tRNA isodecoder pair. The amount of ligation product using the D-oligo corresponds to [tRNA-1]/[tRNA-2] or [tRNA-2]/[tRNA-1], whereas the amount of ligation product using the N-oligo corresponds to [tRNA-1]+[tRNA-2]. These data points together determine the relative amount of tRNA isodecoder pairs between two samples or may even be used to characterize the amount of tRNA isodecoder pairs in the same sample.
Figure 5 Detection of single nucleotide change in a model 30mer RNA by ligation. (A) The basic strategy. RNA oligonucleotides with a single nucleotide difference are used as templates for the ligation of two complementary oligonucleotides by T4 DNA ligase. To (more ...)
In order to find D-oligos for tRNA isodecoder pairs with a single nucleotide difference, we first determined the ligation efficiency using four model 30mer RNAs (). These RNAs have identical sequences except at the 15th position which is A, C, G or U. The ligation efficiency using these 30mer RNA templates is examined using 28 custom ordered oligonucleotide pairs in three configurations ( and ). Set I and set II oligo pairs have the ligation junction at the 3′ and 5′ side of the 15th nucleotide in the RNA, respectively. Set III oligo pairs have the ligation junction displaced 3 nt downstream of the 15th nucleotide. Each set has one identical ‘anchor’ oligonucleotide substrate and 12 each (set I and II) or 4 (set III) different ‘floater’ oligonucleotide substrates. The floater oligonucleotides from sets I and II have different sequences or backbone modifications (). The floater oligonucleotides in set III have different sequences and the same 2′ deoxy backbone.
Our results show that these 28 oligo pairs are sufficient to provide unique D and N-oligos for each of the 6 nt pairs in the RNA (). Under the standard ligation condition, the discrimination factors for the D-oligos are between 5-fold and 108-fold which should be sufficient for the discrimination of single nucleotide changes in tRNA isodecoder pairs. The amount of ligation product has a linear correlation with the known mixture of two model RNAs using the D-oligos and has little dependence using the N-oligo (e.g. C15 and G15 shown in ).
D- and N-oligos for nucleotide pairs
tRNA isodecoder in human samples
We next applied the D- and N-oligo solutions from model RNA studies to human samples to demonstrate the feasibility of this ligation approach for the analysis of biological RNAs (). Probes for three tRNA isodecoder pairs were designed for (i) tRNAPro(CGG) U39 versus C39; (ii) tRNAAla(CGC) A42 versus G42 and (iii) tRNAArg(UCG) A51G52 versus C51A52, in addition to probes for yeast tRNAPhe standard (Supplementary Figure S2). To facilitate detection, the length of the oligonucleotide substrates is designed such that their reaction products differ by at least 5 nt. This way, the analysis of all four tRNAs can be carried out in a single ligation reaction. This length difference is achieved by the extension of a string of deoxy-A residues at the 5′ end of the oligo substrates where necessary (Supplementary Figure S2).
Figure 6 Detection of tRNA isodecoder distribution in human samples. (A) Simultaneous detection of three tRNA isodecoder pairs plus yeast tRNAPhe standard in a total tRNA mixture from HeLa. Asterisk (*) indicates a ligation product derived from the mixture of (more ...)
When these oligo pairs are ligated using the total tRNA mixture from HeLa, varying amounts of ligation products are obtained (). The identity of these ligation products are confirmed by carrying out the ligation separately with each oligo pair (data not shown). This result shows that the ligation strategy to study tRNA isodecoder products works for biological RNA samples as well as the model RNA.
We then used these oligo pairs to compare the amount of the corresponding tRNA isodecoder products in the total tRNA mixture from six human tissues (). Total tRNAs from these tissues were first purified on a denaturing gel. To control for potential RNAs from different tissues that may alter the ligation efficiency, a constant, known amount of yeast tRNAPhe is included in every ligation reaction as control. The amount of the ligation product using the N-oligos shows that among tRNAPro(CGG) and tRNAAla(CGC), brain has the most and ovary and vulva have the least of these tRNAs. The brain sample produces more D-oligo products for tRNAPro(CGG) and tRNAAla(CGC) as well.
The ratio of the D-oligo product divided by the N-oligo product for the same tissue after normalization to the yeast tRNAPhe standard can be used to compare the relative amount for one particular tRNA isodecoder in each tissue (). This analysis shows that although the total amount of these tRNAs can be significantly different in these tissues, the relative amount is all within 2-fold to that in brain.
We also attempted to determine the ‘absolute’ ratio of tRNAPro(CGG)-U39 and C39 using their corresponding D-oligo pairs, i.e. one prefers U over C and the other prefers C over U (D1 and D2 in Supplementary Figure S2). tRNAPro(CGG) can be detected in each tissue using D1 or D2-oligo pairs. However, the relative ligation efficiency for the U-preferring D-oligo is six times greater than the C-preferring D-oligo when using the model 30mer RNA template. Assuming this relative reaction factor (U/C = 6; ) is the same for tRNAPro(CGG), the fraction of tRNAPro(CGG)-U39 is then obtained according to [D(U39)-product]/([D(U39)-product] + 6×[D(C39)-product]) (). This fraction of tRNAPro(CGG)-U39 is between 0.09 and 0.20 among these tissues, and the remaining fraction is presumably tRNAPro(CGG)-C39. Three of the four tRNAPro(CGG) genes have C39 and one has U39. Hence, a completely unbiased expression should generate a fraction of 0.25. This fraction is close to that obtained from vulva, but markedly higher than that from ovary.