Karyopherinβ proteins (Kapβs; Importins/Exportins) mediate the majority of nucleocytoplasmic protein transport. There are 19 known Kapβs in human and 14 in yeast [1
]. Kapβs bind substrates through nuclear localization or export signals (NLSs or NESs) and transport them through the nuclear pore complex, and Ran GTPase regulates Kapβ–substrate interactions [3
]. Ten Kapβs are known to function in nuclear import, each recognizing at least one distinct NLS.
The best-known NLS is the short, basic, classical NLS, which is recognized by Kapα/Kapβ1 [4
], and this pathway is conserved functionally from human to yeast [7
]. Classical NLSs can be divided into monopartite and bipartite NLSs. Monopartite NLSs contain a single cluster of basic residues, whereas bipartite sequences contain two clusters of basic residues separated by a 10–12 amino acid linker. Thermodynamic dissection by scanning alanine mutagenesis of monopartite NLSs from the SV40 large T antigen (PKKKRKV) and the c-myc
proto-oncogene (PAAKRVKLD) [9
] confirmed a previously determined consensus sequence of K(K/R)X(K/R) [8
]. Binding energies of these small signals are dominated by a single lysine residue, in the third position of the SV40 large T antigen and in the fourth position of c-myc, which makes numerous interactions with Kapα [9
]. Thus, in the monopartite classical NLS, it is well-known that a relatively small motif is recognized, and binding energy is concentrated in stereotypical fashion across small sequences. Although numerous structures are available for bipartite NLSs [13
], thorough thermodynamic analysis of this subclass is not available, and its consensus is less well-defined (one example is KRX10–12
KRRK) than that for the monopartite NLS. Furthermore, a nonfunctional SV40 NLS mutant was rescued by a bipartite-like addition of a two-residue N-terminal basic cluster [9
], suggesting that bipartite classical NLSs can accommodate larger sequence diversity than their monopartite counterparts.
Recently, structural and biochemical analyses of human Kapβ2 (Transportin) bound to the hnRNP A1 NLS revealed physical rules that describe Kapβ2′s recognition of a diverse set of 20–30-residue-long NLSs that we termed PY-NLSs [16
]. These rules are structural disorder of a 30-residue or larger peptide segment, overall basic character, and weakly conserved sequence motifs composed of a loose N-terminal hydrophobic or basic motif and a C-terminal RX2–5
PY motif. The composition of the N-terminal motifs divides PY-NLSs into hydrophobic and basic subclasses (hPY- and bPY-NLSs). The former contains four consecutive predominantly hydrophobic residues, while the equivalent region in bPY-NLSs is enriched in basic residues.
Approximately 100 different human proteins have been identified as potential Kapβ2 substrates [16
]. summarizes previously reported validated and potential PY-NLSs. Although many of these potential substrates were predicted by bioinformatics [16
] and still need experimental testing, more than 20 have been validated for Kapβ2 binding () [16
]. Comparison of in vivo and in vitro validated PY-NLSs shows large sequence diversity, which is reflected in weak consensus sequences [16
]. Structures of five different Kapβ2-bound PY-NLSs also show substantial variability, with structurally diverse linkers separating the convergent consensus regions [16
]. The PY-NLS is significantly larger than the short monopartite classical NLS. The well-defined consensus and concentrated binding energy of the latter may reflect compactness of the signal. In contrast, the binding energy of the PY-NLS is spread over a much larger sequence. Physical properties of the multipartite PY-NLS may be more similar to those of the less-studied, larger, and sequentially more diverse bipartite classical NLS.
Summary of Validated and Potential PY-NLSs
Diverse PY-NLSs are described necessarily by weak consensus motifs. Therefore, instead of the traditional way of describing a linear recognition motif with a strongly restrictive consensus sequence, PY-NLSs were described by a collection of individually weak physical rules that together were able to provide substantial limits in sequence space for reasonable predictions of new Kapβ2 substrates [16
]. However, the currently predicted substrates are most likely only a fraction of all PY-NLS-containing proteins because narrow sequence patterns were used in the initial search to achieve optimal accuracy. In fact, the sequence patterns used [16
] were too narrow to predict PY-NLSs in known substrates HuR, TAP, hnRNP F, and JKTBP-1. The coverage of conventional sequence-based bioinformatics searches is expected to be severely limited due to PY-NLS diversity. Although sequence patterns obviously need to be expanded, we do not yet understand the limits of sequence diversity within motifs or how the different motifs may be combined. Knowledge of how binding energy is parsed in PY-NLSs will shape future efforts to decode these highly degenerate signals. Furthermore, physical understanding of how diverse PY-NLS sequences can achieve common biological function also will provide unique insights into many biological recognition processes that involve linear recognition motifs with weak and obscure consensus sequences, such as vesicular cargo sorting and protein targeting to the mitochondria and the peroxisome [28
The yeast homolog of Kapβ2 is Kap104p (32% sequence identity) [34
]. Only two Kap104p substrates, the mRNA processing proteins Nab2p and Hrp1p, are known. Several groups have mapped and validated NLSs of these substrates using both in vivo and in vitro methods to arginine–glycine (RG)-rich regions that were termed rg-NLSs [35
]. Little sequence homology was detected between NLSs recognized by Kapβ2 and Kap104p. Furthermore, substrate recognition by the two karyopherins appears nonanalogous, as Kap104p does not recognize human substrate hnRNP A1 [35
]. Given the recent physical understanding of Kapβ2–NLS interactions, we seek to examine the evolutionary conservation and energetic organization of signals in this pathway through studies of Kap104p–NLS interactions.
First, we present biochemical and biophysical analyses showing that RG-rich substrates of yeast Kap104p share similar physical characteristics to those of human PY-NLSs. Kap104p recognizes the basic but not hydrophobic PY-NLS subclass, and structural analyses of Kapβ2–NLS complexes suggested the origin of this specificity, enabling prediction of PY-NLS subclass specificity for all eukaryotic Kapβ2s. Thermodynamic analyses of Kap104p–NLS interactions revealed biophysical properties that govern binding affinity of PY-NLSs. These signals contain at least three energetically significant binding epitopes that are also linear motifs. Each linear epitope accommodates significant sequence diversity, and we have characterized some of the limits of this diversity. The linear epitopes are also energetically quasi-independent, a property that is probably due to intrinsic disorder of the free signals. Finally, in different PY-NLSs, a given epitope can vary significantly in its contribution to total binding energy. When combined with multivalency, this energetic variability can amplify signal diversity through combinatorial mixing of energetically weak and strong motifs.