|Home | About | Journals | Submit | Contact Us | Français|
When compared to other conserved housekeeping protein families, such as ribosomal proteins, during the evolution of higher eukaryotes, aminoacyl-tRNA synthetases (aaRSs) show an apparent high propensity to add new sequences, and especially, new domains. The stepwise emergence of those new domains is consistent with their involvement in a broad range of biological functions beyond protein synthesis, and correlates with the increasing biological complexity of higher organisms. The new domains have been extensively characterized based on their evolutionary origins and their sequence, structural and functional features. While some of the domains are uniquely found in aaRSs and may have originated from nucleic acid binding motifs, others are common domain modules mediating protein-protein interactions that play a critical role in the assembly of the multi-synthetase complex (MSC). Interestingly, the MSC has emerged from a miniature complex in yeast, to a large, stable complex in insects to humans. The human MSC consists of 9 aaRSs (LysRS, ArgRS, GlnRS, AspRS, MetRS, IleRS, LeuRS and GluProRS) and 3 scaffold proteins (AIMP1/p43, AIMP2/p38 and AIMP3/p18), and has a molecular weight of 1.5 million Da. The MSC has been proposed to have a functional dualism: both facilitating protein synthesis and serving as a reservoir of non-canonical functions associated with its synthetase and non-synthetase components. Importantly, domain additions and functional expansions are not limited to the components of the MSC and are found in almost all aaRS proteins. From a structural perspective, multi-functionalities are represented by multiple conformational states. In fact, alternative conformations of aaRSs have been generated by various mechanisms from proteolysis to alternative splicing and posttranslational modifications, as well as by disease-causing mutations. Therefore, the metamorphosis between different conformational states is connected to the activation and regulation of the novel functions of aaRSs in higher eukaryotes.
The first surprise of the human genome project was the discovery of an unexpectedly small number of protein-coding genes . The explanation that has slowly emerged since is that human genes possess more diverse functions and their regulation is more complicated than their counterpart genes in lower organisms. In this regard, the increasing complexity of aminoacyl-tRNA synthetases (aaRSs)—at the sequence, structural and functional levels—stands out as a prominent example.
Known as an essential component of the translational apparatus, the aaRS family catalyzes the first step reaction in protein synthesis, that is to specifically attach each amino acid to its cognate tRNA. While preserving this essential role, higher eukaryotic tRNA synthetases have developed other roles during evolution. Human cytoplasmic tRNA synthetases, in particular, mediate diverse functions in different pathways including angiogenesis, inflammation, development and tumorigenesis. The functional expansion of aaRSs is thought to be intimately associated with their continuous addition of new domains and motifs during the evolution of higher eukaryotes. Here we review the current knowledge on how human cytoplasmic tRNA synthetases developed their complexity, in sequence and in structure, to expand their “functionome”. Importantly, not all conserved housekeeping protein families possess the increasing complexity of the aaRS family.
Comparing the protein sequences of eukaryotic cytoplasmic aaRSs with their prokaryotic counterparts, it is immediately obvious that eukaryotic cytoplasmic aaRSs are generally larger , which is mainly due to extensions or insertions of conserved sequences (Figure 1). Interestingly, most of these additions are found, at least by structural predictions, to form well-folded structures. Some of them are homologous to structural modules such as leucine-zipper and glutathione S-transferase (GST) domains that are broadly contained in human proteins, whereas others are only found in aaRSs or aaRS-associated protein factors. Those aaRS-specific domains include two common domains (WHEP and EMAPII) that are found in more than one member of the aaRS protein family, and several unique domains that each is found in a single aaRS. In addition to those well-folded domains, some shorter sequences that may not form defined structures are also found as appendices in eukaryotic aaRSs (Figure 1).
Remarkably, the elaboration of such new domains in aaRS proteins reflects the increasing biological complexity of the higher organisms wherein the proteins reside. The more evolutionarily advanced an organism, the more of those domains and sequences appears in its aaRS proteins. Therefore, the evolutionary process to add those domains and sequences on aaRSs appears to be continuous and almost irreversible (Figure 2).
The simplest form of a eukaryotic domain addition is the N-terminal helix. Single helix sequences are found in the N-terminal region of eukaryotic LysRS and AspRS. These two aaRSs belong to the Class IIb of aaRS family and are more closely related to each other than to other aaRS members. Prokaryotic AspRS and LysRS consist only an OB (oligonucleotide binding)-fold tRNA anticodon-binding domain and an aminoacylation domain. However, during evolution of single-celled eukaryote yeasts to humans, each of the two synthetases acquired a ~30–50 aa long N-terminal extension in front of the tRNA binding domain. The extensions, as suggested by structural prediction, contain a long helix of 20–40 aa (Figure 3). The helical conformation was confirmed by NMR structure determination of the N-terminal extensions of human AspRS and human LysRS [3, 4].
These N-terminal helices have evolved to mediate biological activities. They are mostly amphiphilic, with charged residues on one side and hydrophobic residues on the other side. For LysRS, positively charged residues dominate the hydrophilic side of the N-terminal helix. Early work indicated that this helical extension in human LysRS binds to the elbow region of tRNALys and enhances the binding affinity of the synthetase for the tRNA [5, 6]. Human LysRS also plays an important role in HIV infection by delivering tRNALys3 (which acts as a primer for viral reverse transcription) into the virion. Importantly, the function of LysRS in HIV packaging depends on the N-terminal extension, presumably because of its tRNA binding property .
Interestingly, the positively charged N-terminal helix of human LysRS not only interacts with nucleic acids but also with phospholipids and proteins. Recent work has shown that human LysRS, upon phosphorylation, is translocated to the plasma membrane where its N-terminal region, including the N-terminal extension, interacts with the transmembrane region of 67LR laminin receptor. The interaction inhibits the ubiquitin-dependent degradation of 67LR thereby enhancing laminin-induced cancer cell migration .
For yeast AspRS, the N-terminal helix is also positively charged on one side, and has been demonstrated to have tRNA binding property . Interestingly, compared to the yeast enzyme, human AspRS possesses more negatively charged residues located on the N-terminal helix (Figure 3), suggesting that the amphiphilic helices may have evolved from facilitating tRNA binding in lower eukaryotes to mediating other interactions in higher eukaryotes [10–12].
Another appended domain that facilitates tRNA binding is the EMAPII domain. Although originally discovered as a cytokine (endothelial monocyte activating polypeptide II), it is in fact a proteolytic product of MSC p43/AIMP1 – a scaffold protein in the multi-aminoacyl-tRNA synthetase complex (MSC; see below for more information on MSC) .
EMAPII domains are only found in aaRSs (C. elegans MetRS and metazoan TyrRS) and in aaRS-associated proteins (p43/AIMP1 and yeast Arc1p (a scaffold protein for the yeast aaRS complex, see below)). This strict distribution of EMAPII domain suggests that its function evolved specifically for tRNA synthetases. The N-terminal portion of EMAP II (~160 aa) shares sequence homology with Trbp111, a 111 aa, free-standing tRNA binding protein found in bacteria . Structural analysis of EMAPII has revealed that the monomeric EMAPII mimics the dimeric structure of Trbp111 by forming a pseudo dimer interface with its C-terminal sequences (Figure 4) . Trbp111 specifically recognizes tRNA by binding to the elbow region of all tRNAs . In certain bacteria and plants, a Trbp111-like domain is fused to the C-terminus of MetRS to enhance its aminoacylation activity by facilitating the binding of tRNA .
A domain homologous to EMAPII is present in TyrRS from insects to humans. The EMAP II domain in TyrRS, like many other new domains, is dispensable for aminoacylation. However, removal of EMAPII domain (to generate mini-TyrRS) through natural proteolysis outside the cell activates a cytokine-like function embedded in human TyrRS [17, 18]. Removal of EMAPII appears to expose an otherwise masked tri-peptide ELR cytokine motif in the catalytic domain . A separate cytokine activity associated with the EMAPII domain is also activated when it is released from human TyrRS . Interestingly, the ELR motif is not conserved in bacteria and lower eukaryotic TyrRSs, but only starts to appear from insects, concurrent with the addition of the EMAPII domain.
The origin of aaRS appended domains is not confined to tRNA-binding motifs. For example, yeast MetRS does not have a Trbp111 domain as seen in the C-terminus of a prokaryotic MetRS (such as E. coli), but instead has a GST domain at its N-terminus. A GST domain sequence is also present in MetRS from insects to humans and in vertebrate ValRS [20–22]. Removal of the GST domain in yeast MetRS abolished the Arc1p–dependent aminoacylation of tRNAMet . Similarly, cleavage of the GST domain in human MetRS rendered the enzyme to be inactive, indicating the importance of the GST domains in aminoacylation .
In addition to MetRS and ValRS, two other class I tRNA synthetases, GluRS and CysRS, and three aaRS-associated proteins (the yeast Arc1p, two of the three auxiliary factors of the MSC: MSC p38/AIMP2 and MSC p18/AIMP3) all contain a GST domain . These GST domain-containing proteins are found in complexes with other proteins [25, 26], supporting the general idea that GST domains act as protein-protein binding modules. In fact, GST domains do not necessarily possess enzyme activity and are commonly used for protein assembly and for regulating protein folding (such as the GST domains in the S-crystallins, eukaryotic elongation factors 1-γ (eEF-1γ), 1-β (eEF-1β) and the heat shock protein 26 (HSP26) family of stress-related proteins) [27–29]. Interestingly, the yeast MetRS-Arc1p–GluRS ternary complex is formed through two binary GST-GST domain interactions of MetRS-Acr1p and Arc1p–GluRS (Detailed assembly of the complex will be discussed in the next section) . Consistently, human MetRS was shown to interact with two other GST-domain containing scaffold proteins (p38/AIMP2 and p18/AIMP3) through its GST domain[30–32], and deletion of which abolished the incorporation of MetRS to MSC in cell . On the other hand, although the yeast GluRS is known to interact with Arc1p through their GST domains, the function of the GST domain in human bifunctional GluProRS remains to be defined. The role of the N-terminal GST domain in mammalian CysRS, one of the latest domain additions in aaRS, is also undefined. Intriguingly, two versions of CysRS—one with the GST domain and one without—are produced by alternative splicing in mouse and human [33, 34].
Sequence analysis indicates that all GST domains in aaRSs and in MSC p18/AIMP3 and MSC p38/AIMP2 share strong homology with the eukaryote-specific elongation factors eEF-1β and −1γ [26, 27]. Phylogenetic analysis of these GST domains showed stronger homology with each other than with other common GST proteins or GST domain found in bacteria (Figure 5). Possibly, the GST domains in the aaRS protein family (including the elongation factors) were generated from the same origin by gene duplication events. It is interesting to note that yeast GluRS, MetRS and Arc1p genes are located on the same chromosome (VII). However, how these gene duplication events were specifically confined to aaRS proteins is currently unknown.
Many appended domains are involved with functions other than aminoacylation, such as the regulations of inflammation, angiogenesis and even p53 activation. The WHEP domain is one example. Initially discovered in three human aaRSs [TrpRS(W), HisRS(H), GluProRS(EP)] , hence its name, the WHEP domain is ~50 aa long and shares apparent sequence homology among different aaRSs . Two other human aaRSs—GlyRS and MetRS—also contain a WHEP domain at their N- and C-termini, respectively. Early studies, based on sequence similarity and intron positions, suggested that the WHEP domain might have first appeared in HisRS in single-cell eukaryotes (e.g. yeast) and then propagated to other aaRSs [35, 36]. Interestingly, the acquisition of the WHEP domain to GluProRS happened concurrently with the fusion of GluRS and ProRS into one gene, an event that took place prior to the divergence of cnidarians and bilaterians . Among these aaRSs, TrpRS and MetRS appear to have the latest WHEP domain acquisition events that did not occur until the first vertebrates [22, 35]. In contrast to other aaRSs that have only one WHEP domain, GluProRS contains a varying number of WHEP domains depending on the species (3–6). In particular, human GluProRS contains three consecutive WHEP domains in between the N-terminal GluRS and the C-terminal ProRS.
The spreading of this short sequence among several aaRSs suggests that the WHEP domain might be a common tRNA-binding motif , though experiments testing this hypothesis have not arrived on a clear conclusion. Structure and functional analyses have indicated that the WHEP domains fold as a simple helix-turn-helix structure and act as a unique RNA recognition motif (Figure 6) . However, the WHEP domain does not significantly affect, at least in in vitro studies, the aminoacylation efficiency or the tRNA binding affinity of their host aaRSs including TrpRS, GlyRS and GluProRS [40–42]. Recent studies revealed that the WHEP domains in human GluProRS perform non-canonical functions through protein-protein and protein-RNA interactions. For instance, the WHEP domains directly mediate the interaction between the synthetase and NSAP1 (NS1-associated protein), L13a and GAPDH (glyceraldeyde 3-phosphate dehydrogenase) to form a gamma-IFN-activated inhibitor of translation (GAIT) complex [43–45], which interacts with eIF4G to block 43S recruitment and mRNA translation . Using their RNA binding property, the WHEP domains are also responsible for recognizing the GAIT element located on the 5’-UTR of target mRNAs .
Although not needed for aminoacylation, the WHEP domain appears to be a regulator for the non-canonical functions of human TrpRS. In human TrpRS, the N-terminal fused single WHEP domain undergoes a conformational change when the synthetase is bound to Trp-AMP (aminoacylation reaction intermediate), which places the WHEP domain close to the active site pocket . The WHEP domain can be specifically removed by proteolysis or alternative splicing to generate fragments of TrpRS (T2-TrpRS and mini-TrpRS, respectively) that exhibit angiostatic activity through its interaction with the extracellular domain of VE-cadherin on the surface of endothelial cells [48–50]. The WHEP domain of TrpRS also mediates direct interactions with DNA-PK (DNA-dependent protein kinase) and PARP-1 (poly(ADP-ribose) polymerase 1) in the nucleus to activate p53 . Finally, it is interesting to note that, like EMAP II domain, WHEP domain only exists within AARS genes. This aaRS-specific domain expansion is suggestive of a special selective pressure to develop new functions for AARS genes during the evolution of higher eukaryotes. Understanding such pressure may be instructive for understanding the various expanded functions of aaRSs, and vice versa.
Certain domains expanded in aaRS proteins are only involved in protein-protein interactions. As such, a standard module for protein assembly, the leucine zipper, is often found to mediate protein-protein interactions in many biological processes, such as in forming the snare complex in vesicle trafficking and in forming the trimeric structure of gp41 to facilitate the entry of HIV to target cells [52, 53]. The leucine zipper is a helical motif that has leucine residues (or other hydrophobic residues) at every fourth position of the heptad repeats, so that the protruding isobutyl side chains are lined up on one side of the helix. This design creates a hydrophobic spine that inter-locks with its partner to form a coiled-coil zipper.
Leucine zippers exist in ArgRS (a component of the MSC) in higher eukaryotes from insects to humans, but not in lower eukaryotes such as yeast. Leucine zippers also exist in MSC scaffold proteins p43/AIMP1 and p38/AIMP2, suggesting that they play critical roles in the assembly of the MSC (Figure 7). The leucine zipper motifs in ArgRS interact with the leucine zippers in p43/AIMP1, which in turn interact with the leucine zipper motif in p38/AIMP2 . Therefore, these three proteins may form an ArgRS-p43-p38 subcomplex of MSC through several coiled-coil structures. Interestingly, a shorter form of human cytoplasmic ArgRS, which is produced from an alternative translation initiation site on the same mRNA [55–57], lacks the N-terminal leucine zipper motifs (residue 1–72) and the ability to associate with the MSC.
Leucine zippers are completely absent from aaRSs that are not associated with the MSC, suggesting that leucine zippers in aaRSs are exclusively used for the assembly of the MSC. A distinct feature of the leucine zipper motif, compared with other protein-protein interaction modules, is its linear and extended geometry, which may be an important characteristic for providing a framework to support the structure of the MCS.
Besides the above mentioned appended domains that are shared among aaRSs or aaRS-associating factors, other domains/sequences that have evolved within eukaryotic aaRSs share no detectable sequence similarity to common structural modules, and for the most part, each domain is present in only one specific aaRS. Because of their uniqueness, these domains are named as UNE-X, where X represents the amino acid-specificity (in single letter code) of the aaRS to which each domain is appended .
Recently, several studies have suggested the importance of these unique domains in both canonical and non-canonical functions of aaRSs. Eukaryotic IleRS contains two large additions at the C-terminus (named as UNE-I1 and UNE-I2). UNE-I1 (residue 968–1064) is found throughout eukaryotes (from yeast to human), while UNE-I2 (residue 1065–1266) exists only in vertebrates (Figure 8) . Sequence analysis shows that UNE-I2 contains two repetitive sequences each of ~90 aa. This region interacts with the WHEP domains of EPRS and therefore may play a role in retaining IleRS in the MSC [31, 59].
In higher eukaryotes (from C. elegans to human), LeuRS contains an unique domain at the C-terminus with a size of ~ 110 aa (Figure 8) . UNE-L is predicted to have rich secondary structures (beta-strands and alpha helices) that presumably fold into a discrete 3D structure. Human LeuRS was recently found to function as a leucine sensor for the mTOR pathway [60–62]. Specially, human LeuRS associates and activates the RagD GTPase of mTORC1 in a leucine-dependent manner. Removal of the C-terminal ~ 220 residues (including UNE-L and a LeuRS-specific domain) abolished the interaction with RagD . Interestingly, the yeast LeuRS, which does not contain the UNE-L, also controls the TOR pathway. However, in contrast to the human LeuRS, the N-terminal CP1 (editing) domain of yeast LeuRS was proposed to be the binding site for the GTPase, suggesting that the mechanism of LeuRS in regulating the mTOR pathway might be substantially different in yeast as compared to that in mammals [60, 62]. It remains to be determined if the presence of UNE-L has a role in relocating the RagD binding site from the editing domain in yeast to the C-terminus of LeuRS in human.
UNE-F is found at the N-terminus of PheRS α-subunit in eukaryotes (Figure 8). The structure of human PheRS reveals that UNE-F folds into three continuous DNA-binding fold domains (DBD-1, −2, −3) with intervening sequences . Each DBD contains three α helices folded against a three-stranded antiparallel β-sheet. The topology of the DBDs is found in many DNA-binding proteins as well as in double-stranded RNA adenosine deaminase . Modeling of tRNAPhe on to human PheRS suggested that UNE-F interacts with the D, T loops and the anti-codon stem of the tRNA. Deletion of UNE-F abolishes the aminoacylation activity of PheRS, consistent with its predicted role in binding and recognition of tRNAPhe. Interestingly, T. thermophilus PheRS, with a N-terminal structure distinct from the eukaryotic DBD, binds to a specific DNA sequence on its own gene . The presence of three DNA-binding modules in UNE-F suggests that human PheRS might have non-canonical functions involving dsDNA/dsRNA binding such as in transcriptional and translational regulations.
Metazoan and fungal AsnRS differ from their bacterial homologues by the addition of a conserved N-terminal extension of ~110 aa (UNE-N) (Figure 8). Recent structural characterization showed the UNE-N contains a structured region with a novel fold (residues 1–73) that is connected to the remainder of the enzyme by an unstructured linker (residues 74–110) . Shown by NMR, the folded portion of UNE-N features a lysine-rich helix that interacts with tRNA. Whether UNE-N is also involved with non-canonical functions of AsnRS remains to be determined.
GlnRS is predominately found in eukaryotes, whereas in most prokaryotes, Gln-tRNAGln is synthesized by an indirect pathway to first form Glu-tRNAGln by GluRS, and followed by the conversion of Glu-tRNAGln to Gln-tRNAGln by a tRNA-dependent amidotransferase. Compared to the few existing bacterial GlnRSs, the eukaryotic enzymes contain a N-terminal extension of ~200 aa (UNE-Q) (Figure 8). Yeast mutants lacking UNE-Q exhibit growth defects and have reduced complementarity for tRNAGln and glutamine . Structural analysis shows that UNE-Q consists of two subdomains that resemble the two adjacent tRNA specificity-determining domains in the GatB subunit of GatCAB, the trimeric amidotransferase that can use both Glu-tRNAGln and Asp-tRNAAsn as substrates to form Gln-tRNAGln and Asn-tRNAAsn, respectively. The two subdomains of UNE-Q are connected by a conserved hinge region, which when mutated, reduced yeast GlnRS’ affinity for tRNAGln. UNE-Q gives another example that the domain addition or exchange is highly selective for aaRSs or aaRS-related genes.
In addition to the N-terminal GST domain that seems to only exist in mammals, all eukaryotic CysRS contains two other sequence additions (UNE-C1 and UNE-C2) (Figure 8). UNE-C1 is inserted in human CysRS between residues 108 and 223, and in front of the well-known CP1 insertion (residue 273–419). UNE-C2 is a C-terminal extension of ~150 aa. Both UNE-C1 and UNE-C2 are unique to CysRS, and show no apparent sequence homolog to other domains or motifs.
The smallest UNE domain is the UNE-S motif (~30–40 aa) that is located at the C-terminus of vertebrate SerRS (Figure 8). UNE-S was found to be essential for vascular development . Deletion of the C-terminal sequences of SerRS, including UNE-S, led to abnormal blood vessel formation that resulted in premature death in zebrafish [68, 69]. Removal of UNE-S has little effect on the aminoacylation activity of human SerRS. Further studies discovered a significant portion of SerRS is in the nucleus of human umbilical vein endothelial cells, and the localization is directed by a classical nuclear localization signal (NLS) embedded in the UNE-S. Interestingly, SerRS regulates the expression of VEGFA in the nucleus through an unknown mechanism. This novel function is independent of the canonical aminoacylation activity of SerRS, as an aminoacylation-defective form of full length SerRS could fully rescue the vascular phenotype in zebrafish . Therefore, acquisition of UNE-S appears to be important for the non-canonical function of vertebrate SerRS.
In summary, higher eukaryotic aaRSs continue to evolve with the additions of new domain/sequences. Many of these domains are related to their canonical function, especially their tRNA binding properties, while others are dispensable for aminoacylation. Although the timing of each domain addition may be different, one common feature is that once a domain is added to an aaRS, the process is generally irreversible and the domain is conserved from then on as an integral part of that aaRS. Therefore, the timing of domain acquisition may be linked to the new biology associated with increasing complexity of the organism and could provide important hints for their potential functions beyond the canonical aminoacylation function of their prokaryotic homologues . We predict that more non-canonical functions will be discovered that link to the acquisitions of the UNE domains and other appended domains of aaRSs.
Prokaryotic aaRSs are a large family of ~ 20 proteins that perform a common function, namely, aminoacylation of tRNA Unlike many other housekeeping machineries, such as the RNA polymerase complex and the ribosome, they do not form a complex. This feature is presumably related to the independence of the aminoacylation activity for each amino acid and their tRNAs. It might even be beneficial to not have aaRSs too near each other in a complex, as they might otherwise sterically hinder each other for tRNA-binding or attract tRNA nonspecifically, which could increase the chance of mis-binding and mis-charging. Although several small (binary/ternary) aaRS complexes have been found in archaea, they appear to occur serendipitously in only certain species, rather than being a common feature of the family [70, 71]. Thus the emergence of such a conserved, stable, and large complex of aaRSs in higher eukaryotes is remarkable.
In eukaryotes, the complexity of an aaRS complex increases with the complexity of the hosting organism. For instance, in the single celled yeast (S. cerevisiae), the aaRS complex is a relatively simple ternary complex comprising MetRS, GluRS and Arc1p (aminoacyl-tRNA synthetase cofactor 1 protein) (Figure 9) . With a GST domain at the N-terminus and a EMAPII-like domain at the C-terminus, Arc1p is essentially a fusion protein that links a protein-binding module to a tRNA-binding domain . Concurrently, both MetRS and GluRS in yeast acquired an N-terminal GST domain. It is through the GST domains that Arc1p links MetRS and GluRS together .
As mentioned in 2.2, Trbp111 is a non-specific tRNA-binding domain. The Trbp111-like domain in Acr1p not only facilitates the binding of tRNAMet and tRNAGlu to the complex, but also nonspecifically binds to other tRNAs . Although Arc1p is not essential in vivo, it enhances the aminoacylation activity by two orders of magnitude for MetRS and by one order of magnitude for GluRS as shown by in vitro kinetic studies . It is worth noting that the E. coli MetRS has a Trbp111 domain at its C-terminus. Therefore, the same strategy for enhancing MetRS activity in E. coli is also used in yeast through the formation of a MetRS-Arc1p–GluRS complex . Interestingly, a recent study reported the MetRS-Arc1p–GluRS complex can bind and mismethionylates many tRNA species in vitro . Moreover, a similar effect on tRNA mismethionylation was achieved by fusing the Trbp111 domain of Arc1p to the yeast MetRS .
The MetRS-Arc1p–GluRS complex has a number of other functions. One is to regulate the cytoplasmic localization of MetRS and GluRS. Disruption of the complex by deletion of the GST-domain of Arc1p resulted in strong nuclear localization of all three components ; GluRS and Arc1p were found to associate with apurinic/apyrimidinic sites of damaged DNA in the nucleus . In addition, the MetRS-Arc1p–GluRS complex is also important for GluRS trafficking into the mitochondria, which is important for providing an alternative pathway to the GlnRS activity that is lacking in yeast mitochondria [80, 81]. In most bacteria, archaea and organelles of many eukaryotes, including yeast mitochondria, Gln-tRNAGln is generated through an indirect pathway, whereby a nonspecific GluRS synthesized Glu-tRNAGln, and then an aminotransferase converts it to Gln-tRNAGln. However, the yeast mitochondrial GluRS cannot mischarge tRNAGln . Instead, the cytoplasmic GluRS is transported to the mitochondria to charge the mitochondrial tRNAGln with glutamate. The cellular level of Arc1p controls the translocation of GluRS. Interestingly, the expression of Arc1p is downregulated when yeast switches from fermentation to respiratory metabolism, which comes with a high demand for mitochondrial protein synthesis .
In addition, Arc1p is potentially involved with tRNA maturation and exportation from the nucleus. Deletion of Arc1p showed synthetic lethality when combined with the deletion of tRNA transporter Los1, which is responsible for transporting mature tRNAs from the nucleus to the cytosol [23, 83]. Overall, formation of this prototypic aaRS complex is to anchor GluRS and MetRS in the cytosol and to facilitate the aminoacylation function of the two synthetases by providing them with an additional tRNA-binding module – similar roles are found in further developed complexes in higher species.
In ascending the tree of life, eukaryotes evolved from single-celled to multicellular organisms, which allows for the differentiation of cells to have specialized roles. Compared to fungi, the sizes of aaRSs in C. elegans are largely increased by the addition of appended domains. Extensive interactions between new domains present in C. elegans, but not in fungi, appear to facilitate the formation of the aaRS complex in C. elegans, which shares 7 aaRS components with the most highly evolved mammalian MSC. A “pull down” assay revealed that MetRS associates with a complex that contains a MSC p38/AIMP2 homologue and 8 aaRSs including LeuRS, IleRS, GluRS, ValRS, MetRS, GlnRS, ArgRS and LysRS (Figure 9) . The only exception is ValRS, which is found in the C. elegans complex but not in the mammalian MSC. No appended domain is found in C. elegans ValRS, and what enables ValRS to be associated with the C. elegans aaRS complex remains unclear. On the other hand, 2 aaRS components of the modern MSC (AspRS and ProRS) and 2 accessary factors (p43/AIMP1 and p18/AIMP3) are missing in the C. elegans complex. Interestingly, C. elegans MetRS has a C-terminal extension (~320 aa) that contains a leucine zipper and an EMAPII domain that shares extensive sequence homology with mammalian MSC p43/AIMP1. Functional studies have confirmed that human p43 can substitute the C-terminal extension of C. elegans MetRS for its in vivo activity . Therefore, C. elegans MetRS can be viewed as a combination of MetRS and p43/AIMP1.
Although the C. elegans aaRS complex and the modern MSC share a significant degree of homology, some evidence suggests that not all interactions within the complexes have arisen through convergent evolution. Recent phylogenetic analysis of the GluRS and ProRS genes showed that the fusion of the two genes into one bifunctional GluProRS gene appeared before the Bilateria, and that a fission event happened in C. elegans that separated the EPRS gene back to the two genes . This analysis supported the idea that the distinct C. elegans aaRS complex is, at least with respect to some of its components, the result of divergent evolution .
It has long been proposed that the MSC is a ubiquitous cellular component of higher eukaryotes . Its presence has been documented in various higher eukaryotes including rabbits, sheep, bovines, mice, rats and humans, as well as in flies (Drosophila), and genes encoding the three MSC scaffold proteins—p43, p38, p18—are found in all representative species from insects to humans . Also named as aminoacyl-tRNA synthetase interacting multi-functional protein 1, 2, 3 (AIMP1, 2, 3), these 3 scaffold proteins interact with each other and with the 9 AARS components (LysRS, ArgRS, GlnRS, AspRS, MetRS, IleRS, LeuRS and GluProRS) to promote assembly of the MSC with a molecular weight of 1.5 million Da (Figure 9). The function(s) of MSC is unclear but has been proposed to facilitate channeling of aminoacylated tRNA to the ribosome during protein synthesis , and to provide a cellular reservoir of various non-canonical functions of its synthetase and non-synthetase components, in response to certain stimulation or environmental cues [25, 88].
The stoicheometry of the MSC components remains largely consistent, from Drosophila to human [86, 89] (Figure 10). However, the stoicheometry appears to vary upon conditions in cell. For example, one study reported that the number of MSC p43/AIMP1 increased from 2 to 4 molecules per MSC, when the protein was overexpressed in the cell . Also, when the cellular level of methionine decreased, the amount of MetRS in the MSC was observed to double to 2 molecules per MSC .
A number of studies suggest that whereas the MSC serves as a reservoir for almost half of the cellular tRNA synthetases, it also controls the flow of synthetases between their canonical and noncanonical functions [88, 92]. As evidence for the latter function, many MSC components have been reported to be released from the complex under certain conditions. For example, LysRS is released upon antigen-IgE induced mast cell activation ; EPRS is released for the resolution of INF-γ related inflammation ; GlnRS is released during Fas-ligand triggered apoptosis. Interestingly, even the scaffold proteins can be released . MSC p43/AIMP1 is released and secreted to act as a cytokine under stress conditions ; MSC p38/AIMP2 is released upon DNA damage and translocated to the nucleus to activate p53, FBP and TRAF2 [96–98]. MSC p18/AIMP3 is released under UV radiation and is also translocated to the nucleus [99, 100].
How MSC is assembled through various interactions, and how the assembly would allow for specific release of its individual components have been interesting topics for more than two decades. Early studies employing gene knockdowns of each component and by crosslinking approaches have mapped out the interactions within the MSC [30, 31, 101–103]. Interestingly, for understanding the evolution of this complex, certain interactions found in the early yeast complex are conserved in the MSC. For example, the interactions between the GST domains of MetRS, Arc1p and GluRS in the yeast complex are likely to be maintained in the MSC between the GST domains of MetRS, p38 and p18 as revealed by crystal structure analysis of human MSC p18/AIMP3 . On the other hand, the MSC p43/AIMP1 and p38/AIMP2 are proteins newly invented in higher eukaryotes, and are responsible for a number of interactions specific to the MSC.
The MSC component p38/AIMP2 is the core of the MSC, and directly interacts with most of the MSC components, including LysRS, GlnRS, AspRS, GluProRS, and IleRS as well as p43 and p18 [30, 90, 101, 104]. Depletion of MSC p38 leads to complete disruption of MSC [30, 105]. Mapping studies indicated that p38/AIMP2 interacts with other MSC components in a linear and sequence-dependent manner. The N-terminal end of p38/AIMP2 interacts with LysRS; the following coiled-coil region interacts with p43/AIMP1 and ArgRS; a less structured linker interacts with GlnRS; and finally, the C-terminal GST domain binds to p18/AIMP3, AspRS, MetRS and other aaRSs (Figure 10).
Among those interactions, the LysRS and p38 interaction appears to be most critical for the stability of MSC. Sequence alignment shows that the LysRS interacting residues in p38 are highly conserved, supporting the idea that the MSC assembly, especially the LysRS:p38/AIMP2 interaction, is conserved in all higher eukaryotes . In functional tests, gene knock-downs of LysRS led to MSC disruption and subsequent degradation of MSC components . A further study, using recombinant proteins, reconstituted the subcomplex of LysRS:p38 in vitro. Surprisingly, the LysRS:p38 subcomplex showed a (2:1)x2 stoicheometry with each p38 subunit binding to one LysRS dimer through its N-terminal sequences . Because p38 forms this dimer through its C-terminal GST domain, this explains the long observed stoichiometry of 4 subunits of LysRS present in the MSC, the highest among all MSC components . Presumably, this arrangement of mutually independent bindings of two LysRS dimers for p38 allows for specific release of one LysRS dimer under certain response, while maintaining the stability of the MSC with the other LysRS dimer . This novel stoichiometry was later confirmed by a high resolution co-crystal structure of human LysRS-p38, the first subcomplex structure of MSC. The structure shows that the N-terminal 32 residues of p38 are composed of two short motifs, which sequentially bind to two symmetric grooves of the dimeric LysRS .
Structural metamorphosis of aaRSs has occurred during evolution by the addition of new domains, and has played a critical role in the activation and regulation of the expanded functions of aaRSs. Different functions of an aaRS may be represented by distinct conformations of the same aaRS, and each aaRS may be viewed as a collection of various conformational states that can be converted from one to the other. The various conformations of aaRS provide the structural basis for obtaining different interaction partners in different cellular or extracellular milieus that dictate its functionality, and provide ways of regulation in response to environmental stimulus or developmental cues.
The conformational change of an aaRS protein can be achieved by various ways. One mechanism is through proteolysis to remove part of the protein sequence, thus to expose new areas that were masked or hindered in the conformation of the full-length protein. Such conformational resection can also be achieved by alternative splicing at the mRNA level, with subsequent translation of the splice variants into truncated protein sequences. In addition, alternative splicing-based changes are, in principle, capable of creating bona fide structures. Other mechanisms to generate alternative conformations without a resection are exemplified as posttranslational modifications or disease-associated missense mutations, which have been identified in the human population. A conformational change in protein may further trigger a shift in its structural organization at the quaternary level.
Proteolysis has been observed with several human aaRSs at specific regions. Those regions are usually linkers that join the evolutionary conserved aaRS enzyme core with a new domain incorporated at a later stage in evolution. Interestingly, these linkers are usually disordered in crystal structures of aaRSs or, for those without crystal structure information, are predicted to be unstructured. High flexibility of the linkers may render them to be more accessible for various proteases. Interestingly, proteolysis that removes an appended domain from the core enzyme has been shown to be associated with activation of a non-canonical function of aaRS. For example, the EMAP-II-like domain appended to human TyrRS is linked to the core enzyme (named as mini-TyrRS) via an unstructured loop of 22 amino acids (D343-I364) that is disordered in the crystal structure . This loop is cleaved by at least two proteases—plasmin and leukocyte elastase—with different sequence specificities (Figure 11A) [17, 109]. The cleavage at this loop activates the cytokine activities of both mini-TyrRS and EMAP-II, which are otherwise mutually inhibited in the context of the full-length protein. Another example is human TrpRS, which is appended with a helix-turn-helix WHEP domain. The WHEP domain is linked to the core enzyme of TrpRS via a 29-residue loop (G55-D83) in which 21 residues (D61-E81) were disordered in the crystal structure. Similar to TyrRS, the loop contains the cleavage site for both plasmin and leukocyte elastase [49, 110], which cleave off the WHEP domain from the core enzyme to activate its angiostatic activity (Figure 11B).
Addition or removal of an appended domain might not dramatically affect the conformation of the core enzyme. This scenario has been demonstrated by human TrpRS, where the core enzyme adapts essentially the same conformation as a stand-alone protein and as part of the full-length protein . However, as a result of an appended domain being removed, the overall conformation of an aaRS must be affected and certain areas may become exposed. In case of human TyrRS, removal of the EMAP II domain exposes the ELR motif that is critical for mediating the angiogenic and cytokine-like activity of mini-TyrRS . As for human TrpRS, removal of the WHEP domain opens the active site that has both a Trp and an ATP binding pocket. The active site is used to bind to two Trp residues near the N-terminus of VE-cadherin, and the binding blocks the hemophilic interaction between VE-cadherins that is critical for angiogenesis [48–50]. Importantly, a flexible and long linker region, like that found in TyrRS and TrpRS, would allow for the incorporation of more than one protease recognition site to facilitate cleavage at the linker region.
As mentioned above, alternative splicing at the mRNA level can also achieve structural resection of aaRSs. For example, an exon-skipping event generates an mRNA variant of human TrpRS that lacks exon 2. The splice variant is translated into a shorter form of TrpRS (mini-TrpRS) that missing the first 47 residues and the majority of the WHEP domain [112–114]. Mini-TrpRS, similar to the other WHEP domain-deleted forms of TrpRS generated by proteolysis, exhibits angiostatic activity that is masked in the full-length protein (Figure 12A) .
Alternative splicing can also achieve internal deletion, which is not possible by proteolysis. A recent example is an exon-skipping event on human HisRS that removes a large segment of mRNA from exon 3 to exon 10. This event results in the precise deletion of the entire catalytic domain (CD) to make a protein product that directly links the N-terminal WHEP domain to the C-terminal anticodon-binding domain . NMR spectroscopy analysis revealed a dumbbell-like conformation of the splice variant with the WHEP and anticodon-binding domains loosely linked together. Although the conformation of each domain is more or less preserved, the overall tertiary and quaternary structures of the splice variant dramatically differ from that of the full-length HisRS (Figures 12B–D).
In principle, as alternatively spliced mRNAs are translated into new polypeptides, they can generate independent new structures. The new conformations could result from new protein sequences that are being created, for example, by intron retention or frame shift events, or from sequence deletions that may or may not affect the conformation of the rest of the proteins. In case an internal deletion takes place in the middle of a globular domain (rather than at the domain boundary, as in the HisRS case above), a bona fide new structure may be generated that would not be possible to generate by proteolysis.
Posttranslational modifications such as phosphorylation have also been found to control the non-canonical functions of aaRS proteins,, presumably through posttranslational modification–based conformational change. A well-studied example is EPRS, the dual functional tRNA synthetase comprised of a N-terminal GluRS and C-terminal ProRS that are linked together through three consecutive WHEP domains. EPRS is a component of the MSC. Upon interferon-γ stimulation, EPRS is phosphorylated at Ser886 (in between the second and the third WHEP domains) and at Ser999 (after the third WHEP domain and before the C-terminal ProRS), and both phosphorylation events are required to trigger the release EPRS from the MSC . Although the structural change is undefined, phosphorylated EPRS, but not its unphosphorylated form, can facilitate the assembly of a heterotetrameric γ-interferon-activated inhibitor of translation (GAIT) complex that binds to mRNAs with GAIT elements to suppress their translation.
Another example is LysRS. Upon immunological challenge, LysRS in mast cells is phosphorylated at Ser207, which triggers the release of LysRS from the MSC and its subsequent nuclear localization to regulate transcription factor MITF . In the absence of Ser207 phosphorylation, LysRS is strongly associated within the MSC in a “closed” form to catalyze the aminoacylation reaction that charges lysine onto tRNALys for protein synthesis . However, phosphorylation at Ser207 triggers a distinct conformational change that opens up the structure. As a result, phosphorylated LysRS is released from the MSC and translocated from cytoplasm to the nucleus, where it binds to MITF, and generates diadenosine tetraphosphate (Ap4A) to activate the transcription of MITF target genes. The open conformation can no longer aminoacylate tRNA but has significantly elevated activity in Ap4A production . Therefore, phosphorylation-based conformational change switches the function of LysRS from translation to transcription (Figure 13). Significantly, phosphorylation at a different site, Thr52, translocates LysRS to cell membrane, where it interacts with the 67LR laminin receptor and enhances laminin-dependent cell migration in breast cancer cells . Therefore, the observations that two phosphorylation events on one LysRS protein lead to two completely different signaling cascades further indicates the high potential of aaRS structural metamorphosis in regulating the cellular pathways .
In addition to phosphorylation, other types of posttranslational modifications such as acetylation and neddylation have been found on aaRS [117–119]. Although not yet characterized, the modifications could also induce structural and functional changes on their target aaRSs.
Mutations in aaRS genes have been associated with various diseases, and most prominently with a genetic disorder named Charcot-Marie-Tooth (CMT) disease. CMT disease is the most common heritable peripheral neuropathy affecting approximately 1 in 2,500 people . More than forty genes have so far been associated with CMT through mutations that lead to similar clinical presentations that are characterized by loss of muscle tissue and touch sensation in body extremities . Among them, four genes encode aaRSs (i.e. GARS, YARS, AARS and KARS, encoding GlyRS, TyrRS, AlaRS and LysRS, respectively) and thus make aaRS one of the largest gene families associated with CMT.
Among the four aaRSs linked to CMT, GARS mutations were most frequently found in CMT patients. Eleven mutations in GARS have been linked to an axonal type of CMT (CMT2D), and two separate mutations were found in mice to cause CMT2D–like phenotypes. The mutations are distributed throughout the protein in multiple domains, and do not always affect the enzymatic activity of the synthetase . A gain-of-function mechanism has been clearly demonstrated using the mouse model Nmf249, where the CMT2D–like phenotype that is linked to a spontaneous mutation P234KY cannot be rescued by expression of WT GlyRS .
Crystal structure analysis did not reveal significant conformational change caused by a CMT2D mutation, presumably because the potential conformational change is restrained by the crystal lattice interactions . However, a study using a solution method (i.e. hydrogen-deuterium exchange analysis), found that different CMT2D–causing mutations induce similar and dramatic conformational change that opens up the structure to expose a consensus area that is otherwise masked in the WT protein . Based on this study, mutations in this consensus area are hypothesized to be responsible for mediating a gain-of-function interaction that leads to pathological sequelae found in CMT2D patients (Figure 14).
Although a gain-of-function mechanism has not yet been demonstrated with other aaRS proteins linked to CMT, it is possible that a similar mutation-induced structural change may be involved. This consideration is based on the fact that almost all aaRS-associated CMT mutations are dominant, consistent with gain-of-function mutations [126–129]. In addition, CMT-causing mutations are predominantly located near subunit or domain interface, a presumably sensitive location to trigger conformational change. For example, all 13 CMT2D mutation-linked residues in GlyRS are located near the dimerization interface and approximately half of them make direct dimer interactions . Similarly, CMT-linked residues in LysRS are distributed at the dimer interface [130, 131].
About half of the aaRS family members form a quaternary structure (dimer, tetramer, or heterodimer) in order to be catalytically active for aminoacylation. The most common form of a quaternary structure for aaRSs is a dimer. Interestingly, at least for some aaRSs, the dimer interface is considerately smaller in the human proteins as compared to their bacteria counterparts . Presumably, the reduced dimer interface corresponds to a higher propensity for subunit dissociation, whereby the resulting monomeric conformation may be associated with functions that are distinct from the dimer form. An example is human TyrRS. Although both the monomer and the dimer form of mini-TyrRS can bind to the cell surface receptor CXCR1/2, only the monomer form can induce the migration of PMN cells . Another example is LysRS. The monomeric form of LysRS is suggested to interact with the capsid Gag protein of HIV, which helps the packing of tRNALys, a primer for viral reverse transcription, into the HIV virion . Interestingly, although monomeric LysRS is inactive for aminoacylation , disruption of LysRS dimerization seems to not have a major affect on tRNA binding . These examples raised the possibility that monomer-dimer equilibrium is a mechanism to regulate the aminoacylation (dimer) and the novel (monomer) functions of aaRSs.
The catalytically active dimer form not only can dissociate into monomers, but also can further associate to form tetramers (Figure 15). For example, purified human LysRS is found in solution to exist as dimers and as tetramers [107, 130]. Inside the cell, LysRS is a component of the MSC through its interaction with AIMP2/p38, a scaffold protein required for the assembly of the MSC. The binding surface for AIMP2/p38 on LysRS overlaps with the dimer-dimer interface of the LysRS tetramer , therefore if such tetramers exist in vivo, they may regulate the assembly of the MSC.
A conformational change at the quaternary level can also be combined with, or regulated by, conformational changes caused by other mechanisms. For example, the aforementioned splice variant of HisRS can no longer form dimers as the dimer interface is embedded in the catalytic domain which is ablated in the splice variant . Interestingly, the conformational opening of GlyRS by CMT2D–causing mutations alters the dimerization interface, and either inhibits or promotes dimer formation, depending on the mutation .
The number of genes that can be traced back to form the genetic core of common ancestor is extremely small (~80) . More than half of them are translation machineries, including tRNA synthetases, ribosomal proteins and their related factors. These are the only members in the rest of the proteome that match the aaRS family from the point of evolution. When compared to these conserved protein families, such as ribosome proteins, aaRS proteins have shown a high propensity to add new sequences and especially, new domains during the evolution of higher eukaryotes. Why is the expansion of new domains/sequences is unique among the fundamental protein machinery, but universal to the aaRS family? And why are aaRS proteins particularly prominent in their capacities to mediate a broad range of functions beyond translation? These fundamentally intriguing questions remain unanswered. The uniqueness of aaRSs may be connected to some of their special features, such as their long exposure to evolutionary processes, their ubiquitous nature and presence in all life forms, their modular domain architectures, their diverse array of specific amino acid binding pockets, or their unique ability to coordinate translation with other cellular processes. Possibly, the answer lies in the combination of all the features above, and more to be discovered in the future.
Min Guo, Department of Cancer Biology, The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33410, USA, Email: ude.sppircs@niMouG.
Xiang-Lei Yang, Department of Cancer Biology, The Scripps Research Institute, 10550 N. Torrey Pines Road, La Jolla, CA 92037, USA, Email: ude.sppircs@gnaylx.