We performed a genome-wide analysis of the tRNA gene repertoire of the
Bos taurus genome. Although our analyses were complicated by the wide spread distribution of SINEs, we were able to identify a representative set of tRNA genes in the cow. The impact of SINEs on tRNA gene prediction has been previously observed in other genomes. For example, the mouse genome contains over 25,000 tRNA genes and pseudogenes [
16], while the rat genome contains over 175,000 tRNA genes and pseudogenes [
17]. In dog, 401 tRNA-Lys
CTT genes were predicted, many of which are false positives due to a family of SINEs specific among carnivore genomes that have evolved from a tRNA-Lys
CTT [
22].
For the accurate identification of tRNA genes in the mouse genome, putative mouse tRNA genes predicted by tRNAscan-SE were analyzed using RepeatMasker to remove tRNA-related SINEs and then cross-validated by identifying orthologous mouse-human tRNA genes [
16]. We adopted a similar strategy for the annotation of cow tRNA genes except that we compared our putative tRNA genes to a larger set of genomes. While our RepeatMasker analysis was able to filter out a large number of false positives (~96% of total tRNA gene predictions), the remaining set of tRNA genes was still relatively large when compared to numbers observed in other vertebrate genomes. The majority of putative tRNA genes not identified as repetitive elements were derived from three amino acid families, glutamic acid, glycine and lysine suggesting that these might originate from recent expansion of ruminant-specific tRNA-derived repeat families. In fact the family of tRNA-derived SINEs present among ruminants that was created from a tRNA-Glu [
23], may account for many mis-annotated tRNA genes in the glutamic acid family. These SINEs have maintained a high level of sequence similarity to authentic tRNA genes and have tRNA-like predicted secondary structures (Additional file
9). Elevated numbers in the glycine and lysine amino acid families are probably due to point mutations in the anticodon region of the tRNA-related sequence of this tRNA-Glu SINE, leading to TCC, CCC, CTT and TTT anticodons, which belong to the glycine and lysine tRNA gene families.
To help distinguish between well conserved tRNA-like sequences and functional tRNA genes we used a comparative genomics approach using tRNA genes predicted for human, chimpanzee, mouse, rat, horse, dog, chicken, and fugu genomes. The fugu genome in particular, provides a good reference as it lacks many of the repetitive elements present in other vertebrate genomes [
24]. While a number of functional tRNA genes may be filtered out due to the accumulation of mutations in neutral sites, we wanted to define a confident set of cow tRNA genes. With a 95% similarity threshold we identified 406 different tRNA genes with 41 distinct anticodons, encoding 135 tRNAs with unique sequences. However, this set of 405 tRNA genes was only capable of reading 55 of the 61 codons. We manually added the respective tRNA genes for the missing codons, which included 36 additional sequences. Many of these excluded tRNA genes were longer than the average 73 bp, due to the presence of a variable arm in the tRNA and as such lowered the sequence similarity score to less than 95%. We performed a cluster analysis of these cow tRNA genes and explored the relationships between the tRNA genes to understand more about tRNA gene evolution in the cow. From our analysis, tRNA genes with the same anticodon generally formed single clades with high posterior probability support. One exception that we investigated in more detail was tRNA genes from the glycine family. The vertebrates contain three distinct but related families of tRNA-Gly genes, with TCC, GCC and CCC anticodons. However, in mammals an additional family of tRNA-Gly
CCC genes is present (subfamily 2). This observed tRNA-Gly
CCC appears to have arisen in the ancestor of the eutherian mammals and marsupials by a mutation in the anticodon of a member of the tRNA-Gly
GCC family, as this new family was not identified among the non-mammalian genomes we investigated (Additional file
10). For the other more conserved subfamily 1 tRNA-Gly
CCC genes it is not clear whether the apparent absence of orthologs of this gene in the current platypus genome assembly is due to the incomplete nature of the assembly, or to loss of the genes in platypus, similarly for the identification of only one member of the family in the cow genome. Due to the redundancy of function, and the distinctive sequence differences between the two tRNA-Gly
CCC families it will be interesting to uncover the expression patterns of tRNA genes from the glycine family.
While we were able to gain an overview of tRNA genes within the cow genome, we wanted to trace the evolutionary history of cow tRNA genes across various organisms. We used phylogenetic profiles as a method to describe the conservation patterns of cow tRNA genes across 20 species. Generally, genomes that were phylogenetically more related to the cow genome contained a higher number of tRNA orthologs. However there was a large distinction in the number of tRNA orthologs (defined as sequences with ≥ 95% sequence identity) between vertebrate and invertebrate genomes. The number of cow tRNA orthologs among vertebrates is 2–3 folds larger than the number among invertebrates. Expansion of vertebrate genomes resulting in emergence of paralogous copies of tRNA genes may explain the larger numbers of tRNA orthologs observed in vertebrates. This is in line with the observation that the number of tRNA genes in a genome is positively correlated to the genome size [
25]. However, due to the large evolutionary distance between cow and invertebrates, tRNA orthologs in invertebrates may have been omitted due to the stringency of our similarity threshold. The phylogenetic profiles also revealed a dispersed distribution of cow tRNA orthologs. Whilst roughly half of the cow tRNA orthologs are highly conserved and present in all vertebrate genomes the other half display a much more random distribution. This observation is in agreement with the hypothesis of a core and peripheral set of tRNA genes [
26]. The authors suggest that tRNA gene evolution may be a repetitive process, which would explain the distribution of cow tRNA genes observed in our phylogenetic profiles.