|Home | About | Journals | Submit | Contact Us | Français|
The Karyopherin-β family of nuclear transport factors mediates the majority of nucleocytoplasmic transports. Although each of the 19 Karyopherin-βs transport unique sets of cargos, only three classes of nuclear localization and export signals, or NLSs and NESs, have been characterized. The short basic classical-NLS was first discovered in the 1980s and their karyopherin-bound structures first reported more than 10 years ago. More recently, structural and biophysical studies of Karyopherin-β2-cargo complexes led to definition of the complex and diverse PY-NLS. Structural knowledge of the leucine-rich NES is finally available more than ten years after the discovery of its recognition by the exportin CRM1. We review recent findings relating to how these three classes of nuclear targeting signals are recognized by their Karyopherin-β nuclear transport factors.
The transport of proteins between the nucleus and the cytoplasm is signal-mediated: nuclear localization and export signals (NLSs and NESs) direct proteins in and out of the nucleus, respectively. The NPC allows ions, small molecules and many small proteins (<40 kDa) to diffuse between the nucleus and cytoplasm, but larger proteins require the assistance of nuclear transport factors to traverse the NPC. 19 Karyopherin-β (Kapβ; also known as Importin and Exportin) proteins in humans recognize NLS/NES, each functioning as distinct nuclear import, export or bidirectional transport factors (Table 1 ) [2,3]. Kapβs share similar molecular weights (90–150 kDa) and isoelectric points (pI = 4.0–5.0), low sequence identity (10–20%) and all contain 19–20 helical HEAT repeats arranged into superhelical or ring-like structures.
Each Kapβ recognizes a unique set of proteins or RNA, thus creating multiple transport pathways across the nuclear pore complex (NPC). Kapβs also bind nucleoporins (proteins in the NPC), targeting Kapβ-cargo complexes to the NPC for translocation. Kapβ-cargo interactions and transport directionality are regulated by the Ran GTPase nucleotide cycle . RanGTP is concentrated in the nucleus, while RanGDP is concentrated in the cytoplasm. In import pathways, RanGTP and cargos bind Kapβs competitively, allowing cargo binding in the cytoplasm and RanGTP-mediated release in the nucleus. In contrast, in export pathways, RanGTP, cargos and Kapβs bind cooperatively for cargo binding in the nucleus and release in the cytoplasm as Ran-bound GTP is hydrolyzed.
The NLS (by extension, also the NES) was defined as a portion of nuclear transport cargo that is 1) necessary for import/export, 2) sufficient to import/export an unrelated protein, 3) able to bind transport factor directly and that 4) transport is abolished when the transport factor is disabled . These criteria are generally useful until complications arise due to complexities in the cell, such as when transport is mediated by multiple signals and/or transport factors, signals are functional only when appropriately modified or if protein localization is also dictated by nuclear/cytoplasmic retention . Many transport cargos are known for Importin-β (Impβ; also known as Karyopherin-β1), CRM1 (also known as Exportin1 or Xpo1) and Karyopherin-β2 (Kapβ2; also known as Transportin), but many fewer are known for the other 16 human Kapβs [6–10]. Correspondingly, classes of NLS recognized by Impβ, Kapβ2 and an NES for CRM1 have been identified but common properties of signals for the other Kapβs remain unknown. The first characterized nuclear transport signal is the short lysine-rich classical-NLS, which was discovered in the early 1980s and is recognized by Importin-α (Impα; also known as Karyopherin-α), which is an adaptor for Impβ [6,7]. More recently, structural analysis of Kapβ2-cargo complexes led to the discovery of a new class of NLS termed PY-NLS . Finally, many export cargos contain leucine-rich NESs, which are recognized by export-Kapβ CRM1. Structures of the first CRM1-NES complex were reported in 2009 [11••,12••].
The first protein segments found to direct nuclear localization were identified in SV40 T antigen (PKKKRKV) and nucleoplasmin (AVKRPAATKKAGQAKKKKLD) [6,7]. These sequences were named nuclear localization signals or NLSs and they bind the armadillo (ARM) domain of Impα (Figures 1a and 1b) [2,3,13–15]. The N-terminal Impβ-binding or IBB domain of Impα in turn binds Impβ forming a ternary NLS-Impα/β complex (Figure 1a). The small sizes and relatively simple sequence patterns of these basic NLSs have facilitated identification of similar signals in many proteins. Since the discovery of other Kapβ pathways, the Impα/β system is now known as the classical pathway and its short basic signal the classical-NLS.
Structures of unliganded Impα, its NLS complexes and the Impβ-IBB structures as well as previous biochemical studies explained interactions of the three components [16–19]. A portion of the Impα IBB resembles a classical-NLS and binds in an intramolecular fashion to the Impα ARM domain, thus preventing binding with exogeneous NLS . When Impβ binds the IBB, autoinhibition is relieved and the classical-NLS binding site exposed  (Figure 1).
Classical-NLSs contain either one (monopartite classical-NLS) or two (bipartite classical-NLS) clusters of positively charged amino acids. Both NLS variants bind to the Impα ARM domain, which contains ten ARM repeats arranged as a cylindrical superhelix [16,20]. Highly conserved tryptophan-asparagine pairs in ARM repeats 2–4 define the major NLS binding site while a similar arrangement in ARM 7 and 8 constitutes the minor binding site. Monopartite-NLS conforms to the consensus sequence K-K/R-X-K/R (X is any amino acid) [4,21–23]. The signal binds Impα in extended conformation and interaction patterns are similar at both sites (Figure 1b) [16,20]. Asparagines of the Impα tryptophan-asparagine pairs hydrogen bond with the NLS backbone, tryptophans make hydrophobic interactions with the aliphatic side chains of lysines and arginines and peripheral polar and acidic residues make electrostatic interactions with charged NLS sidechains (Figure 1b) [16,20].
Besides the conserved basic residues, sequences flanking the basic clusters of classical-NLSs were also shown to make direct contact with the receptor and modulate Impα binding affinities . In addition, post-translational modifications such as phosphorylation also modulate nuclear import rates. For example, phosphorylation of adjacent Ser111/112 N-terminal to the NLS of SV40 T antigen greatly enhances its nuclear import , while phosphorylation of the nearby Thr124 produces the opposite effect . Fontes et al have further characterized the molecular and structural basis of phosphorylation in this NLS .
The bipartite classical-NLS contains a small N-terminal cluster of basic residues connected to the larger C-terminal cluster by a linker (e.g. AVKRPAATKKAGQAKKKKLD in nucleoplasmin), following a loose consensus sequence of (K/R)(K/R) X10–12 (K/R)3/5, where (K/R)3/5 represents three lysine or arginine residues out of five consecutive amino acids . The C-terminal cluster binds the major NLS binding site of Impα while the N-terminal cluster binds the minor site [16,20]. Individual basic clusters bind Impα independently and combine to create a functional signal. For example, adding an N-terminal basic cluster to a non-functional monopartite classical-NLS increased binding affinity to result in a new functional bipartite NLS . Interactions at both sites are similar to those for monopartite classical-NLSs and the basic clusters are structurally conserved [20,28]. In contrast, connecting linkers are structurally variable.
The bipartite classical-NLS linker is traditionally defined as 9–12 residues but recent studies show that up to 29 amino acids can be present in the linker of a functional bipartite NLS [29•]. These longer linkers are more dependent on specific amino acid sequences and strengths of the basic clusters [29•,30]. Although the linkers are tolerant to mutations, they also contribute to Impα binding affinity [30,31•]. In stronger NLSs, acidic residues and prolines are preferred in the linkers whereas hydrophobic and basic residues are more prevalent in linkers of weaker NLSs . Optimization of Impα binding and import characteristics led to the design of two bipartite ‘super-NLSs’ that bound Impα with such high affinity as to inhibit its import activity [31•]. Linker residues do confer position-specific effects on import activity and thus the bipartite classical-NLS is a multipartite signal with energetic contributions from both basic clusters and linker.
Many proteins imported into the nucleus bind directly to a Kapβ. Sequences that bind Kapβs are diverse, encompassing both structural domains and linear epitopes. For example, Impβ binds directly and imports numerous proteins without involving Impα. Other than the helical Impβ-binding (IBB) domains of Impα and Snurportin 1 [18,32–34], most Impβ cargos show little sequence or structural similarity. Other than the structurally similar IBBs, cargos SREPB2 and the PTHRP segment are structurally diverse and bind at different sites on the Impβ [18,32–34]. Thus, general features among cargos that bind Impβ directly cannot be inferred at this time. The other 16 Kapβs are currently known to recognize small collections of diverse cargos, making it extremely difficult to identify or describe their NLSs or NESs. Nevertheless, the large sizes and low sequence identities of Kapβs coupled with identification of multiple cargo binding sites for several members , suggest that dozens of classes of nuclear targeting signals have yet to be discovered.
Kapβ2 is a prototypical Kapβ, which binds import cargos directly [35,36]. ~20 RNA binding proteins had been biochemically identified as Kapβ2 import cargos [35–40]. NLSs in hnRNP A1 (the 38-residue M9 sequence), HuR, TAP, hnRNP D and JKTPB showed little sequence similarity [35–39,41]. Such sequence diversity had prevented prediction of NLSs in other Kapβ2 cargos.
Numerous Kapβ2 structures now provide structural snapshots of this pathway (Figures 2a and 2b) [8,42–45]. Ran and cargo sites, in the N- and C-terminal arches of Kapβ2 respectively, are spatially distinct but are thermodynamically coupled by the 62-residue internal H8 loop, which occupies the cargo site and displaces cargo when Ran is bound [8,42,46] (Figure 2b).
Recent structural and biochemical studies revealed several common characteristics amongst the apparently disparate signals recognized by Kapβ2, unifying them into a new class of NLS termed PY-NLS . Even though the 15–30 residue PY-NLSs are larger and more complex than the classical-NLSs, they bind Kapβ2 in similar extended conformation indicative of linear and rather long epitopes. However, unlike the classical-NLSs, the diverse PY-NLSs cannot be sufficiently described by a traditional consensus sequence. Instead, they are described by a collection of physical rules that include requirements for intrinsic structural disorder of a large peptide segment, overall basic character, and a set of sequence motifs .
PY-NLS consensus motifs consist of a loose N-terminal hydrophobic or basic motif and a C-terminal RX2–5PY motif (Figure 2c). Composition of the N-terminal motifs divides the signal into hydrophobic (hPY-NLS) and basic subclasses (bPY-NLSs). The former contain consecutive predominantly hydrophobic residues that can be represented by the loose consensus of Φ1-G/A/S-Φ3-Φ4, where Φ is a hydrophobic residue (Figure 2c). The equivalent region in the bPY-NLS is a 4–20-residue segment that is enriched in basic residues. Kapβ2-bound PY-NLSs of hnRNP A1, hnRNP M, hnRNP D, JKTBP and TAP/NXF1 show structurally convergent consensus regions that are separated by linkers, highlighting the multipartite nature of the signal and confirming consensus designation of the motifs despite weak sequence similarity [43,45] (Figure 2c). The large, flat and mixed acidic/hydrophobic cargo-binding site of Kapβ2 allows recognition of the diverse PY-NLS sequences . The hydrophobic motif of hPY-NLS and aliphatic portions of basic residues in bPY-NLS bind the hydrophobic patches while charged sidechains of the bPY-NLS bind the large acidic surface.
Mutagenesis and thermodynamic analysis revealed physical properties that govern PY-NLS binding affinity [47••]. The PY-NLS is composed of three linear epitopes that can each accommodate substantial sequence diversity (Figure 2c). For example, although tyrosine appears conserved in the C-terminal RX2–5PY motif, other amino acids especially hydrophobic ones are well tolerated in this position. Instead of the PY motif, cargos Nab2 and HuR have PL and PG dipeptides, respectively, [47••]. Interestingly, Kapβ2 binding partner ELYS which functions in NPC assembly likely has a PV dipeptide  and ciliary transport cargo KIF17 likely has a PL dipeptide in its ciliary localization signal (CLS) that also binds Kapβ2 . The three PY-NLS epitopes are energetically modular and each epitope can contribute very differently to total binding energy in different PY-NLSs [47••]. Signal diversity can be achieved through combinatorial mixing of energetically weak and strong motifs while maintaining affinity appropriate for nuclear import function These concepts are illustrated by an extremely tight-binding peptide inhibitor that was designed by combining strong motifs from different PY-NLSs [43,47••]. Nuclear import function is lost as RanGTP can no longer dissociate the inhibitor.
The physical rules that describe the PY-NLS (structural disorder, positive charge and weak consensus motifs) were used together in a preliminary bioinformatics endeavor that predicted ~100 candidate Kapβ2 cargos . 13 were previously known Kapβ2 cargos and many new PY-NLSs and several new cargos have been validated [8,47••]. However, narrow sequence patterns used in the initial search suggest that these predicted cargos are likely only a fraction of all PY-NLS-containing proteins . Unfortunately, PY-NLS diversity will severely limit the usefulness of conventional sequence-based and comprehensive identification of the signal in genomes may be extremely challenging. The core problem is that binding energy is distributed across three epitopes in many different ways. Thus, simply relaxing sequence constraints will also increase “noise” and result in many wrong answers. However, the relatively small motif sizes and their energetic independence may allow the problem to be divided into manageable pieces. If binding energies for individual PY-NLS epitopes could be predicted accurately, one could empirically determine combinations that are functional. An approach that combines bioinformatics, structural modeling and prediction of binding energies may be a solution.
The study of nuclear export signals or NESs has lagged behind that of nuclear import. There is only one known class of NES at this time. The first few identified NESs are short leucine-rich segments named leucine-rich NESs. Proteins harboring these NESs are recognized and exported by CRM1 [50–53].
CRM1-specific inhibitor Leptomycin B has facilitated identification and experimental validation of >200 broadly functioning cargos [54,55]. Leucine-rich NESs were usually identified by visual inspection of protein sequences and confirmed by in vivo nuclear export assay and/or in vitro CRM1 binding. Although early NESs in HIV Rev (LPPLERLTL) and protein kinase inhibitor A (LALKLAGLDI) are rich in leucines, many leucine-rich NESs contain other hydrophobic residues [9,10]. Therefore, the 10–15-residue signal is composed of three or four regularly spaced hydrophobic residues (consensus sequence of ϕ1-X2–3-ϕ2-X2–3-ϕ3-X-ϕ4, where ϕn represents L, V, I, F or M; and X can be any amino acid) [55–57]. The consensus describes a helical pattern and is quite ambiguous and will identify segments found in most helix-containing proteins. Like the PY-NLS, the leucine-rich NES is a complex and diverse signal that cannot be sufficiently described by a consensus sequence alone. Despite its non-specific pattern, the NES consensus describes only ~40% of known NESs [55,58•]. In order to more accurately describe NES sequences, Kosugi et al. generated a large number of NES peptides in a random peptide library screen and delineated multiple distinct consensus sequences, which now collectively describe more than 86% of known functional NESs [58•].
Structures of CRM1 bound to cargo Snurportin 1 (SNUPN), with and without RanGTP have recently been solved [11••,12••,59•] (Figures 3a and 3b). With the exception of the Ran binding regions, both CRM1 structures are virtually identical. CRM1 has 20 HEAT repeats that are arranged into a ring (Figures 3a–c). RanGTP binds inside the ring, contacting H1-H9, its C-terminus and the long β-hairpin loop of H9 [12••] (Figure 3a). Unlike other karyopherins, which all seem to bind cargos using their concave surfaces, SNUPN binds to the convex side of CRM1 [11••,12••].
SNUPN binds CRM1 in a multipartite manner through its N-terminus, nucleotide binding domain (NBD) and C-terminal tail [11••,12••]. The first 16 residues of SNUPN contain a leucine-rich NES [11••], which binds to a hydrophobic groove at HEAT repeats H11 and H12 of CRM1 (Figure 3d). The N-terminal location and the 30-residue loop that follows allow the NES to protrude away from the rest of SNUPN and be accessible for CRM1-binding. The NES forms a short 3-turn amphiphatic helix, followed by three residues in extended conformation. Hydrophobic side-chains are aligned for placement into the hydrophobic groove, leaving polar residues exposed on the opposite side. Interaction between the NBD and CRM1 is mostly electrostatic, with the NBD basic surface binding to an acidic CRM1 surface that is adjacent to the NES-binding groove.
Leptomycin B alkylates Cys 528 of human CRM1, which is located in the NES-binding groove [11••,12••]. The inhibitor likely occupies the hydrophobic groove and prevents NES binding. Binding of Ran effector RanBP1 to CRM1 and RanGTP also displaces the NES. Koyama and Matsuura showed that RanBP1 binding shifts the internal H9 loop of CRM1 to the concave side of the NES-binding groove (H11 and H12), driving helices there together and closing the hydrophobic groove to dissociate the NES [60•] (Figures 3d and 3e). It is unclear if the NES groove is similarly closed in unliganded CRM1.
The helical structure of the leucine-rich NES reflects its consensus sequence and may be an important and prevalent physical characteristic of the signal. Interestingly, the most active NES sequences identified in an export activity assessment study bear the helical consensus of ϕ1-X3-ϕ2-X2-ϕ3-X-ϕ4 . The same sequence pattern was found to be the most abundant natural NESs [58•]. However, the CRM1 groove likely also accommodates non-helical NESs, as a few of the new consensus sequences by Kosugi et al are not helical patterns [58•]. High proline content of the well-known NES from HIV Rev (LPPLERLTL) suggests that this particular signal is either a severely distorted helix or adopts an extended conformation. Additional structures of leucine-rich NESs bound to CRM1 will be required to map the repertoire of NES conformations that are recognized by CRM1.
The ambiguous NES consensus, prevalence of the sequence patterns in the genome and the high likelihood that hydrophobic sequences are found in the interior of proteins all suggest that many identified and predicted leucine-rich NESs may be false positives. It is critical that a nuclear targeting signal be accessible, within the full-length cargo protein, for interactions with its Kapβ. Regions surrounding NESs tend to be over-represented by polar and acidic residues, perhaps reflecting protein surface locations [55,58•]. Leucine-rich NESs located at protein termini, in structurally disordered regions or flanked by long loops, are more likely to be accessible to CRM1.
Even though 19 different Kapβs mediate nuclear transport of proteins, only three classes of nuclear targeting signals have been characterized. The Impα/β heterodimer recognizes the monopartite and bipartite classical-NLSs, Kapβ2 recognizes the PY-NLS and CRM1 recognizes the leucine-rich NES. The monopartite classical-NLS is a compact signal with a well-defined sequence pattern but the bipartitie classical-NLS is larger and more complex. The PY-NLS and the leucine-rich NES are very diverse signals that cannot be sufficiently described by traditional sequence patterns but are instead described by collections of physical rules. One of the most important requirements for a nuclear targeting signal is its accessibility within the cargo protein, indicating that the signal either lacks secondary structure or it is located in a disordered or flexible region. Dozens of nuclear targeting signals have yet to be discovered and many are likely poorly defined in sequence. The concept of defining nuclear targeting signals with a series of weakly restrictive rules may be generally true.
We thank Bostjan Kobe, Anita Corbett, Dirk Görlich and Thomas Güttler for discussions. This work is funded by the National Institutes of Health (R01-GM069909 and GM007062), Welch Foundation (I-1532), Leukemia and Lymphoma Society Scholar award and UT Southwestern Endowed Scholars Program.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.