|Home | About | Journals | Submit | Contact Us | Français|
Members of the p24 (p24/gp25L/emp24/Erp) family of proteins have been shown to be critical components of the coated vesicles that are involved in the transportation of cargo molecules from the endoplasmic reticulum to the Golgi complex. The p24 proteins form hetero-oligomeric complexes and are believed to function as receptors for specific secretory cargo.
Using sensitive sequence-profile analysis methods, we identified a novel β-strand-rich domain, the GOLD (Golgi dynamics) domain, in the p24 proteins and several other proteins with roles in Golgi dynamics and secretion. This domain is predicted to mediate diverse protein-protein interactions. Other than in the p24 proteins, the GOLD domain is always found combined with lipid- or membrane-association domains such as the pleckstrin homology (PH), Sec14p and FYVE domains.
The identification of the GOLD domain could aid in directed investigation of the role of the p24 proteins in the secretion process. The newly detected group of GOLD-domain proteins, which might simultaneously bind membranes and other proteins, point to the existence of a novel class of adaptors that could have a role in the assembly of membrane-associated complexes or in regulating assembly of cargo into membranous vesicles.
The Golgi complex is the central secretory organelle of most eukaryotic cells and consists of membranous stacks called cisternae [1,2]. Secreted proteins, like all other proteins, are synthesized in the endoplasmic reticulum (ER) and are specifically packaged into vesicles that bud off from the ER in a GTP-dependent process [3,4]. These lipid vesicles are coated with the COPII coat protein-complex and are equipped with the ATP-dependent vesicle-fusion apparatus. They carry the secretory cargo to the cis surface of the Golgi complex, with which they fuse, delivering the cargo. A second type of vesicle, coated by the COPI coat-protein complex, is part of a retrograde pathway that buds off the Golgi membrane and returns proteins that are not targeted for secretion back to the endoplasm [3,4].
Studies on the secretory system in crown-group eukaryotes (plants, animals and fungi) have uncovered a family of proteins, the p24 (p24/gp25L/emp24/Erp) family, that have an important role in cargo selection and packaging into COPII-coated vesicles [5,6,7,8]. Additionally, they might also function in excluding secreted proteins from COPI-coated retrograde vesicles [9,10]. Members of the p24 family are type I membrane proteins, with a small carboxy-terminal cytoplasmic tail that interacts with the vesicle coat proteins and a globular lumenal region that probably interacts with the cargo [11,12]. They are abundantly distributed on the membranes of the vesicles budding off the ER and the cis Golgi membranes. The p24 proteins belong to at least four distinct subfamilies [8,12] and form hetero-oligomeric complexes that contain at least one member from each subfamily. This heteromerization of the p24 proteins has been shown to require a coiled-coil stretch at the extreme carboxyl terminus of their lumenal regions .
Improved understanding of the p24 family may throw light on evolution and function of the Golgi apparatus in eukaryotes. With this objective, we conducted a computational sequence analysis of the p24 proteins and show that they contain a conserved globular domain that is also present in several other Golgi and lipid-traffic proteins. We present evidence that this module is likely to serve as a common denominator in protein-protein interactions in several distinct contexts, such as in secretory vesicles and on the Golgi peripheral membrane. The proliferation of this superfamily appears to have been central to the diversification of the eukaryotic secretory apparatus.
The bona fide p24 proteins contain a short carboxy-terminal tail that interacts with the COP-complex proteins through specific short peptide motifs. The amino-terminal region that faces the lumen is much larger and is predicted to form a compact globular unit. As this region of the protein is likely to contain a conserved globular domain that mediates other functional interactions of these proteins, we sought to investigate its complete diversity and potential evolutionary connections. We carried out a profile search of the Non-Redundant protein database (of the National Center for Biotechnology Information, NCBI) using the PSI-BLAST program , seeded with lumenal region of the Caenorhabditis elegans p24 family member K08E4.6 (the profile-inclusion threshold was set at 0.01 and the search iterated until convergence). This search readily detected the classical p24 family members that are found in six to nine copies in the proteomes of most organisms belonging to the eukaryotic crown group. In addition, this search retrieved several other proteins that do not belong to the p24 family with statistically significant expectation (E)-values (E < 0.001, see Figure Figure11 legend). These proteins include yeast Osh3p, a cytoplasmic oxysterol-binding protein, animal Sec14-like proteins that are involved in secretion, human GCP60 (also called PAP7, a peripheral-type benzodiazepine receptor-associated protein ), which interacts with the Golgi integral membrane protein Giantin, and several other uncharacterized eukaryotic proteins with different lipid-binding domains (Figure (Figure1).1). Reciprocal searches initiated with this region from the newly detected proteins showed that they were more closely related to each other, but in subsequent iterations they recovered the classic p24 family members at significant E-values, suggesting that all these conserved regions define a novel superfamily of protein domains. Separate prediction of the secondary structure of this domain from the p24 family and the newly detected proteins, showed that the two groups had essentially the same core structural elements, further reinforcing their relationship. As this conserved domain is present in at least three distinct classes of proteins related to Golgi dynamics (animal Sec14 proteins, the p24 family and GCP60-like proteins), we name this conserved region the GOLD domain.
The presence of the GOLD domain at the extreme amino or carboxyl terminus of the Osh3p and animal Sec14 proteins, respectively, allowed us to establish accurate boundaries for it. The domain is typically between 90 and 150 amino acids long and, in the p24 family, it comprises almost the entire lumenal region, with the exception of an α-helical extension of approximately 50 amino acids that precedes the transmembrane segment. Most of the size difference observed in the GOLD-domain superfamily is traceable to a single large low-complexity insert that is seen in some versions of the domain. A secondary-structure prediction for the domain using the PHD  program reveals that it is likely to adopt a compact all-β-fold structure with six to seven strands. Most of the sequence conservation is centered on the hydrophobic cores that support these predicted strands. The predicted secondary-structure elements and the size of the conserved core of the domain suggests that it may form a β-sandwich fold with the strands arranged in two β sheets stacked on each other.
Experimental studies so far on diverse proteins containing GOLD domains point to a role for it in protein-protein interactions. A region of the GPC60 molecule that rather precisely encompasses the GOLD domain has been shown to bind to the cytoplasmic region of the Golgi membrane protein Giantin . Cross-linking experiments have suggested that the p24 proteins interact directly with the cargo molecules that are present in the lumen of the COPII-coated vesicles and that they are, accordingly, cargo receptors . However, yeast deletion mutants lacking all the p24 proteins grow similarly to wild type, although they show delays in translocation of a subset of cargo molecules such as invertase and Gas1p from the ER to the Golgi, and increased secretion of resident ER proteins . Certain members of the p24 family from vertebrates have also been shown to bind to specific ligands such as the interleukin-1 receptor-like molecule T1/ST2 and might aid its proper expression on the cell surface . These observations suggest that the p24 subset of the GOLD domains probably function as discriminators that selectively interact with particular proteins to influence their loading into vesicles. The GOLD domains show considerable variability in some of the loops that are predicted to extrude from the core β-sandwich-like structure (Figure (Figure1).1). These loops might form exposed surfaces that provide the GOLD domains with the discriminatory capacity necessary for their interactions with diverse ligands.
With the exception of the p24 proteins, which have a simple architecture with the GOLD domain as their only globular domain, all other GOLD-domain proteins contain additional conserved globular domains (Figure (Figure2).2). In these proteins, the GOLD domain co-occurs with lipid-, sterol- or fatty acid-binding domains such as PH [20,21], Sec14p , FYVE , oxysterol binding- and acyl CoA-binding domains, suggesting that these proteins may interact with membranes. The FYCO1 protein that combines a GOLD domain with a FYVE domain, also contains a RUN domain , an uncharacterized α-helical domain that may have a role in the interaction of various proteins with cytoskeletal filaments [24,25]. An orthologous group of proteins typified by human Sec14L1, which is conserved in all animals, has, in addition to the carboxy-terminal fusion of the Sec14p and GOLD domains, a previously unrecognized, conserved amino-terminal domain (Figures (Figures22,,3).3). This domain has so far been found only in eukaryotes, and occurs in stand-alone form in several proteins, including the human PRELI protein  and the yeast MSF1p' protein. The PRELI/MSF1p' domain is approximately 170 residues long and is predicted to assume a globular α + β fold with six β strands and four α helices (Figure (Figure3).3). MSF1p' is proposed to be involved in mitochondrial protein sorting , suggesting that the PRELI/MSF1p' domain may also have a function associated with cellular membranes.
Thus, all GOLD-domain proteins can be divided into two architectural categories: the p24-like category, in which the GOLD domains project into the lumen, anchored in the membrane by the membrane-spanning helix (category 1); and proteins in which the GOLD domain occurs at the extreme amino or carboxyl terminus, with additional domains that are known to interact with lipid membranes (category 2) (Figure (Figure2).2). GCP60, which is peripherally associated with the Golgi membrane, is one of the proteins in the second category that has been experimentally characterized. It has been shown that overexpression of a region of this protein encompassing the GOLD domain caused disassembly of the Golgi structure and abrogated protein transport from the ER to the Golgi .
These observations can be accommodated by two (not mutually exclusive) hypotheses regarding the functions of these proteins. The GOLD proteins belonging to the second architectural category could function as double-headed adaptors that interact with both a specific protein (via the GOLD domain) and different cellular lipid membranes. Thus, GCP60 and GOLD proteins with analogous architectures could help in the assembly of vesicular or Golgi-membrane-associated protein complexes by tethering specific proteins to the membranes, with the GOLD domain binding the protein targets and the lipid-binding protein to the membrane. Alternatively, at least some of the category-2 proteins could function as a previously unrecognized class of vesicular cargo-loading molecules that associate with the membrane via their lipid-binding domains and deliver their protein ligands via the GOLD domain. The observation that deletion mutants lacking all the p24 proteins still show normal trafficking of certain proteins such as carboxypeptidase Y, suggests that there are some protein-trafficking pathways that are unaffected by their absence. Thus, the GOLD-domain proteins of category 2 may have a specific role in regulating the secretion of molecules that are not affected by the p24 proteins. The hetero-oligomerization of the p24 proteins via the coiled-coil regions carboxy-terminal to the GOLD domain seems to help in generating combinatorial diversity for their interactions with multiple ligands. The presence of extensive coiled-coil segments in some of the category-2 GOLD-domain proteins, such as FYCO1, suggests that they might also form oligomers, like the p24 proteins.
Similarity-based clustering and phylogenetic analysis divides the GOLD domains into two primary divisions that precisely mirror the two categories established on the basis of domain architectures (Figure (Figure2).2). This division was also supported by a synapomorphic (shared derived) feature in the form of two conserved cysteines, which is restricted to the p24 family (category-1 proteins). Likewise, the presence of a specific insert between strand 1 and 2 with a characteristic conserved tryptophan serves as a synapomorphic feature for category-2 GOLD domains (Figure (Figure1).1). An analysis of the phyletic patterns suggests that the p24 family had already differentiated into at least four distinct subfamilies in the common ancestor of plants, animals and fungi. The detection of multiple members of the p24 family in the early branching eukaryotes such as Cryptosporidium parvum and kinetoplastids suggests that some of this diversification was probably already under way early in eukaryotic evolution. Within the eukaryotic crown group, we obtained evidence of specific instances of duplications and gene losses that are restricted to particular lineages. The most striking case is seen in Arabidopsis thaliana, which appears to have proliferated the Erv25 subfamily (five to six members), but lacks the Erp2p and Erp5p subfamilies. The second major family of GOLD domains (category 2) is so far only attested in the crown group. In fungi, this group is typified by Saccharomyces cerevisiae Osh3p, which combines an amino-terminal GOLD domain with PH and oxysterol-binding domains. The greatest architectural diversity of this group is seen in animals (Figure (Figure2),2), suggesting that there was increased proliferation and domain shuffling among these proteins concomitant with the evolutionary emergence of the animals. This might correlate with the increased complexity of animal-specific secretory functions.
A novel β-strand-rich domain was identified in numerous eukaryotic proteins, including the p24 proteins, which appear to have a function related to the Golgi complex, secretion or protein sorting. These GOLD domains are predicted to be involved in specific protein-protein interactions. Other than the p24 proteins, GOLD domains are present in several proteins where they occur at the extreme termini and are combined with diverse membrane- or lipid-binding domains. These proteins are predicted to be double-headed adaptors that may help in the assembly of protein complexes on membranes or in the packaging of specific cargo molecules in membranous vesicles. The identification to the GOLD domain may help in a directed dissection of p24-family function and provide novel candidate molecules for experimental studies on secretion and sorting.
The Non-Redundant (NR) database of protein sequences (National Center for Biotechnology Information, NIH, Bethesda) was searched using the BLASTP program . Profile searches were conducted using the PSI-BLAST program with either a single sequence or an alignment used as the query, with a profile-inclusion expectation (E)-value threshold of 0.01, and were iterated until convergence [13,28]. Previously known conserved protein domains were detected using the corresponding PSI-BLAST-derived position-specific scoring matrices (PSSMs) . The PSSMs were prepared by choosing one or more starting queries (seeds) for a set of most frequently encountered domains (see reference  for details) and run against the NR database until convergence with the -C option of PSI-BLAST to save the PSSM. We ensured that at convergence no false positives were included in the profiles. This profile database can be downloaded from  or used on the internet via the RPS-BLAST program . All globular segments of proteins that did not map to domains with previously constructed PSSMs were searched individually using PSI-BLAST to detect any additional domains that may have been overlooked.
Multiple alignments were constructed using the T-Coffee program , followed by manual correction based on the PSI-BLAST results. Protein secondary structure was predicted using a multiple alignment as the input for the PHD program . Signal peptides were predicted using the SIGNALP program [33,34] and the transmembrane regions were predicted using the TOPRED program . Phylogenetic analysis was carried out using the maximum likelihood, neighbor-joining and least-squares methods [36,37]. Briefly, this process involved the construction of a least-squares tree using the FITCH program or a neighbor-joining tree using the NEIGHBOR program (both from the Phylip package) , followed by local rearrangement using the Protml program of the Molphy package  to arrive at the maximum likelihood (ML) tree. The statistical significance of various nodes of this ML tree was assessed using the relative estimate of logarithmic likelihood bootstrap (Protml RELL-BP) with 10,000 replicates.
We thank Eugene Koonin for providing useful comments on the manuscript.