|Home | About | Journals | Submit | Contact Us | Français|
Most membrane enveloped viruses bud from infected cells by hijacking the host ESCRT machinery. The ESCRTs are recruited to bud sites by viral proteins that contain short proline-rich motifs (PRMs) known as late domains. The late domains probably evolved by co-opting host PRMs involved in the normal functions of ESCRTs in endosomal sorting and cytokinesis. The solution and crystal structures of PRMs bound to their interaction partners explain the conserved roles of Pro and other residues that predominate in these sequences. PRMs are often grouped together in much larger proline-rich regions (PRRs) of as many as 150 residues. The PRR of the ESCRT-associated protein ALIX autoregulates its conformation and activity. The robustness of different viral budding and host pathways to impairments in proline-based interactions varies considerably. The known biology of proline rich motif recognition in the ESCRT pathway seems, in principle, compatible with antiviral development, given our increasingly nuanced understanding of the relative weakness and robustness of the host and viral processes.
Many membrane-enveloped viruses hijack the host ESCRT machinery in order to escape from cells (1–6). The ESCRTs are responsible for severing the narrow membrane neck connecting the nascent virion to the plasma membrane (7). ESCRT-dependent viruses recruit the ESCRTs principally via short peptide motifs called “late domains” (8). The three classes of late domain involved in ESCRT recruitment are all proline-rich motifs (PRMs), and have the form PPXY, P(S/T)AP, or LYPXnL. Viral late domain PRMs evolved to mimic the normal functions of PRM in host physiology. Viruses have had a lot to teach cell biologists, the ESCRT pathway being an excellent case in point. Indeed, the discovery of the viral PRM-ESCRT interactions led to the characterization of the first example of a host-encoded ESCRT-binding PRM. Within cellular regulatory and trafficking proteins, PRMs very often occur within the context of larger proline-rich regions (PRRs). One of the questions addressed in this review is whether PRRs have functions beyond being a collection of contiguous PRMs.
Here, we consider PRRs and PRMs as a central theme in the normal physiology and structural organization of the ESCRT pathway and as potential targets for antivirals. In addition to the long PRRs within ESCRT proteins themselves, we also consider short PRMs. We define a PRM here to be a short segment that has an identified interaction and function and at least one conserved and functionally required Pro. PRMs are common in the sequences of ESCRT proteins and in other cell and viral proteins that bind to ESCRTs, and occur both within larger PRRs and in isolation. In this review, we summarize the accumulated knowledge about what these regions do in the ESCRT pathway, what their structures look like, and prospects for their targetability in antiviral drug design.
Before delving into the biology of PRRs and PRMs in the ESCRT pathway, we will briefly review why Pro is such a special amino acid. Pro is unique among the amino acid constituents of proteins because the Cδ of its side-chain is cyclized to its backbone amide nitrogen. This has several consequences for the structural and binding properties of Pro (9). The backbone amide of Pro cannot donate a hydrogen bond. Pro is thus a disruptor of hydrogen bonding in both α-helices and β-sheets, even though individual Pro residues can still adopt these conformations. On the other hand, Pro is strongly favored at certain positions of type I and type II β-turns. The conformational torsion angle of Pro is restricted by the cyclization. The bulk of the cyclized N-Cδ linkage constrains the conformational space of the residue preceding the Pro, disfavoring the α-helical conformation. When multiple Pro residues occur consecutively, their conformations are so restricted that they become essentially locked into the type II polyproline (PPII) helical conformation. Uninterrupted polyPro tracts are thus far more rigid than the other sequences. Finally, Pro is the only residue for which the cis peptide conformation occurs with any frequency. Thus PRRs have unique conformational constraints that govern the structures of proteins that contain them, but since these segments are typically removed from constructs used in crystallography and NMR, their structural roles are underappreciated.
In addition to these unique conformational restrictions, Pro has a hydrophobic ring which can form complementary packing interactions with the flat rings of the aromatic side-chains Phe, Tyr, and Trp, the flat guanidino group of Arg, and of course, other Pro side-chains. The rigidity of PRMs has important consequences for binding. Because the unbound PRM has fewer conformational degrees of freedom, these sequences lose less conformational entropy upon complex formation. Because the entropic free energy penalty for binding is smaller, fewer molecular contacts are needed to yield the same binding affinity, as compared to a non-Pro based sequence. Unlike the sometimes bewildering and uncooperative full-length PRRs, the PRMs, once whittled down to short peptides, readily give their secrets up to structural and molecular biologists.
ALIX has a key role in ESCRT-mediated membrane abscission in cytokinesis, and mediates the LYPXnL motif-driven budding of viruses by the ESCRTs. ALIX also seems to have roles in apoptosis and endocytosis, and it remains unclear whether and how these latter functions are related to the ESCRT-associated roles of ALIX (10). ALIX contains a N-terminal Bro1 domain that recruits and/or regulates the membrane scission-promoting ESCRT-III subunit CHMP4. This is followed by a central V domain that is the locus for binding to the LYPXnL motif, discussed below as one of the ESCRT-interacting PRMs. Most important for purposes of this review is the C-terminal 150 amino acid PRR (Fig. 1). The ALIX PRR has a 33 % Pro content and is also enriched in Gly, Ala, Ser, Thr, Tyr, Gln, and Asn, all typical of low complexity sequences.
The ALIX PRR contains five PRMs of known function. The 717PSAP720 sequence binds to the UEV domain of the ESCRT-I subunit TSG101 (11–13). The 740PTPAPR745 PRM binds to SH3 domains of the endocytic adaptor proteins CIN85 (14) and CD2AP (15). The SH3 domain is a ubiquitous PRM binding domain, and indeed was the first one to be described. The ALIX PTPAPR sequence belongs to an unusual subset of two-faced SH3 binding-PRMs, which bind to two SH3 domains at once, so dimerizing the SH3 domain-containing proteins (16). The 748PPTKPQPPARPPP760 region contains the binding site for the conventional SH3 domains of the Src (17, 18) and Hck kinases (19) and the endocytic protein endophilin (20). The multi-SH3 domain ubiquitin ligase POSH binds to the ALIX PRR (21), probably through one of these SH3 binding PRMs. Thus far, none of the three PRMs described above have been shown to be essential for any of the ESCRT-related functions of ALIX. The midbody protein CEP55 binds to the 800GPPYPTY806 sequence in ALIX, and this interaction is required for ESCRT-mediated membrane abscission in cytokinesis (22–25). The CEP55 binding site overlaps with the 799QGPPYPTYPGYPGYCQ814 sequence that binds to the Ca2+-binding apoptosis regulator ALG-2 (26). The ALIX PRR contains one PPXY motif, 834PPVY837, which could potentially serve as the binding site for the WW domain-containing ubiquitin ligase Nedd4 (27, 28). Taking the above described PRMs together, they account for 47 of the 150 residues of the PRR.
The ALIX PRR has been implicated in both the dimerization and autoinhibition of ALIX. ALIX dimerizes both through its V domain (29, 30) and the 852PSYP855 sequence close to its C-terminus (31) (Fig. 2). The ability to dimerize via the PRR appears to be important for ALIX-mediated HIV-1 budding, although this requirement can be overridden in the case of another virus, EIAV, that seem to interact more strongly with ALIX than HIV-1 (31). There is clear evidence that the PRR autoinhibits both the CHMP4 and HIV-1 LYPXnL motif binding activities of ALIX (32–34). The autoinhibition seems to involve a contact between the N-terminus of the PRR and a Src-phosphorylation site (17) on a tip of the Bro1 domain (32) known as “patch 2” (35). This contact is not compatible with the conformation of the ALIX Bro1-V construct (ALIX-ΔPRR) in the crystallized monomer (36) or the solution dimer (30). These ALIX-ΔPRR structures likely represent an active conformation of ALIX. The current autoinhibition model thus suggests that the conformation of full-length autoinhibited ALIX will be different from any state observed so far (Fig. 2). In summary, current data suggest a dual role for the ALIX PRR as a locus for multiple PRMs and as a regulator of ALIX conformation and activity.
TSG101 is one of four subunits of the ESCRT-I complex. Its N-terminal UEV domain binds both to the ubiquitin tag on ESCRT cargo (37) and to a PRM of the form P(S/T)AP (38–41). The UEV domain is connected to the heterotetrameric core of ESCRT-I (42) by a 70-residue PRR with a 30 % Pro content. The only PRM of known function within this region is the CEP55 binding sequence 157GPPNTSY163 (22, 23, 43) (Fig. 1). As described below, the TSG101 PRR also binds ALG-2, but the precise motif has not been mapped. Apart from providing a tether connecting the UEV domain to the core, it is unclear why this region has evolved to be as big or as Pro-rich as it is. The yeast ortholog of TSG101, Vps23, contains a 47 residue PRR with a 45 % Pro content within its UEV-core linker (Fig. 1). The Vps23 PRR contains no identified PRMs. Indeed, none of the human PRM binding protein mentioned above are conserved in yeast, except for Bro1 (ALIX) and Vps23 (TSG101) themselves. The Vps23 PRR does contain one contiguous tract of six consecutive Pro residues. The Pro6 tract seems at least as likely to contribute to the mechanical rigidity of the Vps23 PRR as it does to molecular recognition, and one wonders if the most ancient role for the PRR might not be a mechanical and conformational one.
The P(S/T)AP motif is best known as the major late domain of HIV-1, HTLV-1, Ebola, and a half-dozen or so other characterized viruses (3) (Table 1). In human cells, the best characterized P(S/T)AP motif is that of the Hrs subunit of ESCRT-0 (44–46). The P(S/T)AP sequence itself is abundant in the human proteome, but many P(S/T)AP sequences are buried within folded domains and so not functionally available for ESCRT interactions. TSG101 itself contains an internal PTAP motif, but this sequence is predicted to be buried within the stably folded headpiece domain based on the conformation of its ortholog in the crystallized yeast ESCRT-I core complex (42). Indeed, short peptides spanning internal P(S/T)AP sequences in both TSG101 and another human ESCRT-I subunit, VPS37B, bind to the UEV domain in vitro (47, 48), but they do not seem to be functional in the context of the complete ESCRT-I complex in cells (48).
Evidence for a physiological ESCRT-binding role for the P(S/T)AP is available for a handful of proteins besides Hrs. These include Tom1L1 (49), GGA3 (50), and the ubiquitin ligase Tal (51). However, at least fifteen additional P(S/T)AP-containing proteins bind to TSG101 in cells, including proteins of RNA processing and silencing, vesicular trafficking from the endoplasmic reticulum, and transcriptional regulation (48). The only known ligand for the P(S/T)AP motif is the UEV domain of the TSG101 subunit of ESCRT-I. The affinity of the shortest P(S/T)AP peptides tested is modest, in the range of 50–300 μM (52, 53), with slightly higher affinities seen for longer HIV-1 Gag constructs (54). P(S/T)AP peptides bind in an extended (β-strand) conformation to a site involving the C-terminus of the UEV domain (53, 55) (Fig. 3A). The Ser/Thr of the motif is required because the side-chain hydroxyl makes a hydrogen bond with the main-chain backbone nitrogen of Asn69. The Ala is required because it packs in a pocket that is too small to allow any other residue. The final Pro of the motif is also tightly packed into a pocket that fits its side-chain uniquely. On the other hand, the first Pro of the motif is not strictly required (48). Indeed, some synthetic peptides with hydrophobic substituents added to the first Pro actually have a higher affinity for the UEV domain (52, 56). The yeast ortholog of Hrs is Vps27, and also uses a PRM to bind to the UEV domain of Vps23. The relevant PRM in yeast has the form SDP (57), and is important for the co-localization of ESCRT-0-mediated cargo clusters with ESCRT-I-mediated membrane buds (58). The yeast SDP motif binds to a completely different site on the UEV domain as compared to the P(S/T)AP motif of multicellular eukaryotes (59) (Fig. 3B). SDP motifs seem to be important for the recruitment of the ESCRTs in other contexts in yeast, for example, one of the yeast arrestin-related trafficking proteins (ARTs) is apparently recruited by this mechanism (60). The P(S/T)AP and SDP sequences are not functionally interchangeable. The larger theme of PRM-UEV interactions is preserved across the eukarya, but the structural details have almost nothing in common between yeast and multicellular eukaryotes.
The PTAP-UEV domain interaction is essential for HIV-1 budding. Current AIDS antivirals target the three enzymes encoded by the viral genome. Because of the prospect of drug resistance, there is considerable interest in targeting host-virus interactions such as that of the Gag PTAP and TSG101 UEV domain. The obvious concern in targeting host proteins is toxicity from interference with vital cellular functions. Fortunately, mutational disruption of the P(S/T)AP binding site on the TSG101 UEV domain does not seem to impair receptor downregulation by the ESCRTs, at least not for the degradation of endogeneous EGF receptors (53). On the other hand, mutational disruption of the site does lead to a partial defect in ESCRT-mediated cytokinesis (22). Thus a spectrum of phenotypes is seen following interference with P(S/T)AP recognition: strong (HIV-1 release), intermediate (cytokinesis), or none (downregulation of endogenous EGF receptor).
The main function of PPXY motifs is to target the proteins containing them to interact with WW domains (so named for their two signature Trp residues). The ESCRT complexes per se do not contain WW domains, but the WW domain-containing Nedd4 family ubiquitin ligases (Rsp5 in yeast) are intimately connected with the ESCRTs. PPXY late domains of many viruses, including Rous sarcoma, Marburg, Ebola, and Rabies viruses (3) (Table 1), recruit the Nedd4 family ligases WWP1, WWP2, and Itch to the site of virus budding (61). The link from these ligases to the ESCRTs probably involves more than simply the ubiquitination of viral proteins. The arrestin-related trafficking (ART) proteins (62) contain PPXY motifs of their own, bind directly to ESCRTs, and appear likely to play a role as bridges connected the ligases to the ESCRTs in PPXY-dependent budding (63). Some cargo, such as the epithelial sodium channel ENaC, contain PPXY motifs and appear to be shunted into the ESCRT pathway through a direct interaction with WW domains of Nedd4 family ligases. WW domains fall into classes with additional specificity determinants. The third WW domain of Nedd4 belongs to class I, and its most preferred motif is PPXYESψΦ, where ψ and Φ refer to aromatic and hydrophobic residues, respectively (64). The ENaC β subunit has a PPXY peptide conforming to the preferred motif, and it binds to the third WW domain of Nedd4 with Kd = 20 μM (65). The structure of the ENaC PPXY motif with this WW domain showed that the first three residues of the PPXY motif are in a PPII conformation and directly contact one of the signature Trp residues (65) (Fig. 3C). The final Tyr residue of the motif contacts the ring of an essential His side-chain of the WW domain and makes other hydrophobic contacts. Most of the Nedd4-family substrates that go on to become ESCRT cargo do not contain PPXY motifs. Of the various mechanisms for the ubiquitination and entry of these cargo into the ESCRT pathway, it currently appears that ART proteins may in many cases be the direct cargo selectors (62). The ARTs then contribute their own PPXY motifs to recruit the Nedd4 family ligases, leading to cargo ubiquitination and finally internalization into multivesicular bodies (MVBs) by the ESCRTs.
The LYPXnL motif, where n=1 or 3, is the third example of a Pro-containing ESCRT-interacting sequence first discovered as a viral late domain. These motifs bind to the V domain of ALIX with low micromolar affinity (29, 36, 66, 67). The N-terminal LYP sequence and the C-terminal Leu bind in the same pockets on arm 2 of the V domain (67) regardless of whether they are separated by one or three residues. In the case of the motif from HIV-1 Gag p6, where n = 3, the intervening residues form one turn of an α-helix (67). In EIAV p9, n = 1, and the single intervening residue is in an extended conformation (67) (Fig. 3D). ALIX and the Gag LYPXnL motif sustain a significant level of residual HIV-1 budding in the absence of the TSG101-PTAP interaction. Indeed, in Jurkat cells, > 20 % particle release is seen even when both the PTAP and LYPXnL motifs are crippled (68). Any concerted therapeutic program to shut down ESCRT-mediated HIV budding would at a minimum need to target the ALIX V domain in addition to the TSG101 UEV domain, so it is fortunate that all of these structures are now in hand.
The concept of the LYPXnL motif and the ALIX V domain as canonical motif-domain interaction pair is elegant, but we now know it is too simple. Certain SIV Gag proteins lack LYPXnL motifs, yet still bind ALIX with affinities in the tens of micromolar (69) (Fig. 3E). These motifs all bind to the same site on the V domain, and all have a Tyr residue that interacts with the same pocket on the V domain. Otherwise, these sequences have almost nothing in common; one of the sequences does not even have a Pro among its binding determinants. So far, the LYPXnL motif has been a less productive guide to normal ESCRT physiology than the other viral PRMs. In Aspergillus, the transcription factor PacC contains such a motif, and interacts with an ALIX homolog in the course of its pH-dependent cleavage and activation (70). In the canonical ESCRT pathway, clear cut functions for V domain interactors have yet to be established. The absence of known functions for this domain in human physiology is at least encouraging with respect to its targetability for antivirals. Perhaps the ALIX V domain will, in the course of its normal function in the ESCRT pathway, turn out to use nonstandard interactions such as with the SIV Gag sequences.
Only three functional occurrences of this specialized motif have been reported to date. These are within the PRRs of ALIX and TSG101, described above, and that of the germ cell cytokinesis inhibitor TEX14 (71). The function of the GPPX3Y motif is to target the protein containing it to the midbody connecting a pair of dividing cells. There, the ESCRTs are involved in the process of constricting and finally severing the narrow membrane neck connecting the two daughter cells (22–25). Peptides derived from the GPPX3Y sequences of ALIX and TSG101 bind with Kd ~ 1 μM affinity to the coiled-coil midbody protein CEP55 (43). One peptide binds to one CEP55 dimer. Despite its conservation between ALIX and TSG101, the Gly of the motif is not particularly important for binding. The Pro residues are much more important, and the Tyr is critical (43). Some questions remain, for example, it is not clear how essential is the spacing of three residues between the diPro and the Tyr. A more complete understanding of the binding determinants for this site, such as from a peptide library study, would be helpful and might perhaps identify other protein ligands for CEP55. In contrast to other PRMs, which are bound by discrete, specialized protein domains, the GPPX3Y motifs bind to a region within the much larger CEP55 coiled coil. The ESCRT- and ALIX binding region (EABR) within CEP55 is differentiated from the rest of the coiled coil because it possesses unusual bulky and charged residues at the a and d positions. These unusual features force the two coils apart, creating a single binding site for the GPPX3Y motif. In spite of the presence of a diPro sequence, the motif does not have a PPII conformation. The structure of the peptide is best described as being in a β-conformation over the GPP residues, followed by the X3Y residues in a β-turn. The GPPX3Y motif on CEP55 consists of two subsites, one that binds to the GPP portion of the motif, and a second that surrounds the Tyr (Fig. 3F). The site seems to be optimized primarily to bind the surfaces of the Pro and Tyr rings as opposed to selecting a special conformation.
ALG-2 (apoptosis-linked gene 2) is a dimeric Ca2+-binding EF hand protein that binds to the PRRs of both ALIX (20) and TSG101 (72). The binding is Ca2+-dependent, and the ALG-2 dimer appears to be capable of acting as a Ca2+-dependent bridge to recruit ALIX and TSG101 to one another in cells (73). The physiological ramifications of Ca2+ signaling into the ESCRT system have yet to be deeply explored, but we now understand a great deal about this interaction at least at the structural level. The binding site for ALG-2 on ALIX has been mapped to a Pro, Gly, and Tyr rich section that overlaps with all of the CEP55 binding sequence described above, and extends beyond it. Clearly, ALG-2 and CEP55 cannot bind to ALIX simultaneously. The corresponding sequence in TSG101 has not been mapped in detail, but by analogy to ALIX probably corresponds more or less to 158PPNTSYMPGMPGGI171. The ALIX peptide was co-crystallized with ALG-2 (26), and the segment 801PPYPTYPGYPGY812 interacts extensively with ALG-2 (Fig. 3G). The ALG-2 pocket that enfolds the N-terminal PPYP residues only forms the binding-competent conformation in the presence of Ca2+ (or Zn2+ as a surrogate), explaining the Ca2+-dependence of binding. Despite the high density of Pro residues in this motif, there are apparently enough non-Pro residues that the type II polyproline helix is not the preferred conformation. Instead, the peptide manifests a series of 2–3 residue stretches that are in a β conformation. These are connected by a series of ~90 degree bends at Gly and Pro residues (and in one case, Thr805). The turn-favoring propensity of Gly and Pro probably facilitates these bends and may explain the conserved role of so many Gly and Pro residues in this PRM. Two Tyr residues (803 and 809) are deeply buried in hydrophobic pockets, while another two interact more superficially. Three of the Pro residues (802, 804, and 810) form extensive hydrophobic interactions. On the other hand, the surface-exposed Pro807 seems to have a purely conformational role.
The past few years have shown us just how complex the rules can be when it comes to PRMs in ESCRT function. The P(S/T)XP:UEV domain case showed how two seemingly similar motifs, differing only by an Ala versus an Asp at the X position, can turn out to interact at completely different binding sites on one conserved domain in different species. Just as two apparently similar motifs can interact with very different binding sites, a single binding site can recruit very different motifs. The clearest case amongst the ESCRT PRMs are the unusual ALIX V domain binding sequences in certain SIV strains that, beyond a single Tyr residue, have no obvious common features with the canonical motif. In an exciting development, the combination of bioinformatics and proteomics is beginning to provide us with a more reliable picture of the PRM interactome (48). With respect to the long PRRs of ESCRT proteins such as ALIX and TSG101, two-thirds or more of the residues within the PRR cannot be accounted for in terms of known interactions. One wonders if it is that the majority of binding partners have yet to be discovered, or is it the unique conformational properties of Pro in rigidifying and extending the reach of the PRR in three dimensions make these regions so important. With respect to targeting PRM-ESCRT interactions with antivirals, the accumulated data reveal significant differences in the relative robustness of viral and host pathway dependencies to interference with ESCRT-PRM interactions. In principle, this should be favorable to their targeting by antivirals. The potential existence of currently uncharacterized physiological roles of ESCRT PRMs in areas such as RNA processing could introduce complications, however. The biggest challenge in targeting this interaction may be at the level of chemistry rather than biology, given the issues with the screenability and drugability of peptide-protein interactions. Nevertheless, vigorous efforts are being devoted to overcoming these obstacles (52, 56, 74).
This work was supported by the Intramural Program of the NIH, NIDDK and the Intramural AIDS Targeted Anti-viral Program of the Office of the Director, NIH (J. H. H.), and an Intramural AIDS Research Fellowship to X. R.