|Home | About | Journals | Submit | Contact Us | Français|
Ribosomal protein L14e is a component of the large ribosomal subunit in both archaea and eukaryotes. We report here a high resolution NMR solution structure of recombinant L14e and show that the N-terminal 57 residues adopt a classic SH3 fold. The protein contains a tight turn between strands 1 and 2 instead of the typical SH3 RT-loop, indicating that it is unlikely to interact with neighboring ribosomal proteins using the common SH3 site for proline-rich sequences. The remainder of the protein (39 residues) forms a largely extended chain with a short helix which packs onto the surface of the SH3 domain via hydrophobic interactions. It has the potential of adopting an alternative structure to expose a hydrophobic surface for protein-protein interactions in the ribosome without disruption of the SH3 fold. 15N relaxation data demonstrate that the majority of the C-terminal chain is well defined on the SH3 surface. The globular protein unfolds reversibly with a Tm of 102.8 °C at pH 7, making it one of the most stable SH3 domain proteins described to date. The structure of L14e is expected to serve as a model for other members of the L14e family, along with members of the COG2163 group, including L6e and L27e. Interestingly, the N-terminal sequence of L14e shows the greatest similarity of any Sulfolobus protein to the reported N-terminal sequence of Sac8b, a DNA-binding protein reported by Grote et al. (Biochim. Biophys. Acta 873, 405-413 (1986)). The likelihood that L14e and Sac8b are the same protein is discussed.
The ribosome is the universal molecular machine which is responsible for translating mRNA into polypeptides and proteins. It is composed of about 80 proteins assembled onto ribosomal RNA to form a 70S particle in prokaryotes (eubacteria and archaea) and a larger 80S particle in eukaryotes. The intact ribosome can be separated into two subunits referred to as small and large: 30S and 50S in prokaryotes, and 40S and 60S in eukaryotes. Although the intact archaeal ribosome subunits are smaller than those in eukaryotes, the archaeal proteins are more similar to those in eukaryotes than eubacteria (1). The structure of the bacterial ribosome has been defined by crystal structures of the 30S and 50S subunits (2-4) and the intact 70S particle (5-8). Crystal structures of the archaeal 50S subunit from Haloarcula (9) and the intact yeast ribosome (10) have been published, along with cryomicroscopy re-constructions of the canine ribosome (11, 12).
L14e is known to be a protein component of the large ribosome subunit in eukaryotes and Archaea based on 2D gel electrophoresis and sequencing, but the structure and position in the ribosome have not been described. The L14e sequence is conserved in archaea and eukaryotes (including yeast, Drosophila, rat, and human), where the protein is sometimes referred to as L14 (1, 13-16). L14e shows no homology to the bacterial L14p protein (unfortunately also sometimes referred to as L14) for which a structure has been determined (17). The archaeal protein is approximately 30% identical to the eukaryotic 60S ribosomal protein L14e (e.g. Homo sapiens NP_003964). Eukaryotic L14e is often significantly larger (e.g 24 kD for rat L14) with additional domains at the C-terminus (18) which have been identified as nuclear targeting sequences, along with a bZIP domain (19). Auto-antibodies to L14e in humans have been associated with systemic lupus erythematosus (20). Although L14e is well conserved, it is not found in all archaea, and therefore it is presumably not essential, although homologous proteins may exist which can function in its place. Notably it does not occur in the halophilic archaeon Haloarcula, the source for the only structure of an intact archaeal ribosome available to date (21).
It is interesting to note that the N-terminal sequence of L14e from Sulfolobus acidocaldarius is homologous to the N-terminal sequence of Sac8b, a small, basic protein initially described by Dijk and coworkers as an 8 kD DNA-binding protein (22, 23). Sulfolobus expresses a number of small, thermostable, nucleic acid binding proteins, e.g. Sac7d, Sac8a, Sac8b, Sac10a, and Sac10b (22-26). The initial characterization of Sac8b was limited presumably by low expression levels. Sac8a and Sac8b differed in charge and DNA binding properties (22, 23), and gel filtration demonstrated that both proteins were monomeric in solution (23). No structural information was presented for 8b, other than the N-terminal 17 residue sequence (i.e. PAIEIGYIGVETRGDEA), which demonstrated that it was unrelated to the 7 and 10 kD proteins (23).
We report the high resolution NMR structure of L14e from S. solfataricus, along with a characterization of its stability and flexibility in solution. The protein adopts an SH3 fold with an additional C-terminal tail that is largely extended, but rigidly associated with the surface of the SH3 domain. Given the high stability and classic fold, the structure is expected to serve as an excellent model of the structure of L14e in both the Archaea and eukaryotes.
The gene sequence for L14e was located in the Sulfolobus solfataricus P2 genome sequence database (27), where it is designated rpl14e (locus SSO5763) for the ribosomal protein L14e. The sequence was amplified by PCR from Sulfolobus solfataricus DNA, cloned into pETBlue-2 (Novagen), and expressed in RosettaBlue(DE3) (Novagen). Uniformly 15N and/or 13C enriched L14e for NMR studies was obtained by supplementing minimal media with 15NH4Cl (Isotec) and/or 13C-glucose (Isotec). Protein expression was induced with IPTG (1 mM), the temperature was reduced to 27 °C for 8-10 hours, cells were harvested by centrifugation, and the pellet was stored at -80 °C. For cysteine mutants used in some studies, site directed mutagenesis was used to convert cysteines 10 and 27 to alanines using the Quikchange II Site-Directed Mutagenesis Kit (Stratagene, La Jolla, CA) with primers supplied by IDT (Coralville, IA). DNA sequencing by contract with Functional Biosciences (Madison, WI) was used to verify the mutations.
Frozen cells were thawed and suspended in 100 ml of cold (4 °C) buffer (10 mM EDTA, 10 mM Tris-HCl, pH 8.0, 0.1% Triton X-100 and 0.5 mM PMSF) and lysed by sonication, Dnase I (Sigma) was added (0.5 mg/ml), and the suspension was incubated at 37 °C for five minutes. The suspension was then incubated at 70 °C for 40 minutes to precipitate E. coli proteins, and the solution clarified by centrifugation at 300000g. The supernatant was filtered (0.45 μm filter) and recombinant L14e was purified by cation exchange chromatography on a Hi-trap SP (Pharmacia) column equilibrated with 10 mM KH2PO4 (pH 7). L14e was eluted with a linear 0-1.0 M NaCl gradient at about 0.38 M NaCl. Protein purity was demonstrated by SDS gel electrophoresis, and a molecular weight of 10727 determined by mass spectrometry (by contract with Stanford University Mass Spectrometry Laboratory) indicated that the N-terminal methionine was removed by methionine aminopeptidase. The protein sequence derived from the genome sequence, NP_341932 (Figure 1), was confirmed by the NMR assignments described below. For consistency, we have maintained the residue numbering scheme used in the database sequences (e.g. NP_341932), with the NMR structure sequence beginning with proline 2. The protein concentration was determined using a calculated (28) of 0.129 ml•(mg•cm)-1, which includes the expected contribution from a disulfide.
DSC measurements were performed using a MicroCal (Northampton, MA) Extended Range VP-DSC with a cell volume of 0.5 ml. Protein samples were dialyzed against an excess of the appropriate buffer for approximately 12 hours, and aliquots of protein and dialysis buffer were degassed with stirring under vacuum using the MicroCal accessory. Protein and buffer solutions were scanned at 1.5 °C/min from 5 to 125 °C under an applied pressure of 25 psi to prevent vaporization and bubble formation. Reversibility of thermal unfolding was demonstrated with repeated scans on the same sample. DSC data were analyzed using an IGOR Pro (Wavemetrics, Oregon) program (IgorDenat, available at http://daffy.uah.edu/thermo) as described elsewhere (29).
CD measurements were performed on an Olis (Bogart, GA) spectrometer using 0.15 mg/ml protein solutions in 10 mM KH2PO4, pH 7.0, 20°C, with a 0.1 cm path length cell. The secondary structure of the protein was determined using CDPro with a reference set of 37 proteins over a wavelength range of 185-240 nm (30).
NMR spectra were collected on a Varian (Palo Alto, CA) 800 MHz (18.7 T field) INOVA NMR spectrometer using a triple resonance probe with tri-axial pulsed field gradient capability, and a Varian 500 MHz (11.7 T field) INOVA NMR spectrometer with z-axis pulsed field gradient capability. Pulse sequences were either those provided in the Varian (Palo Alto, CA) BioPack, or were kindly provided by Dr. Lewis Kay (University of Toronto). All NMR spectra were collected at 30°C unless indicated otherwise. 1H chemical shifts were referenced using sodium 4,4-dimethyl-4-silapentane-1-sulfonate (DSS) as an internal reference. 13C and 15N chemical shifts were referenced indirectly to DSS and liquid ammonia, respectively, using the appropriate frequency ratios (31). NMR spectra were processed using NMRPipe (32), FELIX (Accelrys, San Diego, CA) or VNMR (Varian, Palo Alto, CA). NMRView (33) was used for visualization and chemical shift assignments of NMR data.
NMR samples were prepared using lyophilized 15N,13C-enriched L14e which was dissolved in 700 μL of 90 % H2O/10% D2O or in 700 μL of 99.996% D2O. Final protein concentrations were approximately 1 mM, and the pH was adjusted to 5.0 using a Radiometer glass electrode with either HCl or NaOH for 90% H2O/10% D2O solutions, or DCl or KOD for 99.996% D2O solutions. No correction was made for the deuterium isotope effect on pH.
1H-15N residual dipolar coupling measurements were made at 11.7 T using partially aligned 15N labeled L14e in a liquid crystalline media of n-alkyl-poly(ethylene glycol) and hexanol (34). Samples were prepared in 90% H2O/10% D2O with 5% C12E5 (Sigma) with a C12E5/hexanol molar ratio (r) of 1.0. Aligned spectra were obtained at 30°C, and unaligned spectra were obtained at 30° after heating to 40°C. Single bond 1H-15N RDCs were obtained using a 3D HNCO-IPAP pulse sequence (hnco_hn_coupling_notrosy_ipap_lek) kindly provided by Lewis Kay (University of Toronto). Raw data sets contained 844 data points in the 1H dimension and 42 increments in the 15N dimension. Data were linear predicted by a factor of 2 in the 15N dimension and zero filled to obtain 2048 × 64 × 512 (1H × 13C × 15N) data points in the final spectrum. Residual dipolar couplings were extracted using in-house NMRView scripts.
[1H]-15N NOE and 15N T1 and T1ρ measurements were performed at 11.7 T using sensitivity enhanced gradient selected HSQC pulse sequences (35, 36). 1H saturation for [1H]-15N NOEs was obtained using a series of 120° 1H pulses with 5 ms separation. Saturation was performed for 5 s, with a total recycle delay of 10 s for both saturated and unsaturated experiments. Suppression of cross-correlation effects in 15N T1ρ experiments was obtained using a train of phase alternated, random length 1H CW pulses applied during the 15N spin lock (36). T1 delay times were 0.05, 0.10, 0.16, 0.22, 0.29, 0.36, 0.45, 0.55, and 0.66 s. T1ρ delay times were 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.08, and 0.10 s. R1 and R1ρ data were fit with the CURVEFIT module of MODELFREE (37). R2 relaxation rates were calculated from R1ρ rates as described by Korzhnev et al. (36). Duplicate [1H]-15N NOE experiments and duplicates of 15N T1 and T1ρ experiments at three different delay times were used to estimate uncertainties in peak intensities. The errors in measured R1, R2, and NOE values averaged 0.5, 1.0, and 1.9 %, respectively. The errors used for relaxation analysis were those measured or 2.5%, whichever was greater.
Backbone 1H, 13C, and 15N assignments were obtained using 1H-15N heteronuclear single quantum coherence (HSQC), HNCA, HNCO, HNCACB, CBCA(CO)NH, HN(CO)CA and HNHA spectra collected at 18.7 T on a uniformly 15N,13C-double-labeled protein sample dissolved in 90% H2O/10% D2O. Side chain assignments were obtained for 15N-13C-labeled L14e using HCC-TOCSY-NNH and CCC-TOCSY-NNH in 90% H2O/10% D2O, and HCCH-TOCSY and HCCH-COSY in 99.996% D2O. HBCBCGCDHD (38), 2D DQF-COSY, 2D NOESY and 1H,13C HSQC spectra were collected on a doubly labeled protein sample in 90% H2O/10% D2O to obtain aromatic proton and carbon assignments. An 15N-edited nuclear Overhauser effect spectrum (NOESY) (150 ms mixing time) on double labeled protein in 90% H2O/10% D2O, and a NOESY-13C HSQC experiment (150 ms mixing time) on protein in 99.996% D2O were collected for NOE measurements. Backbone amide protons involved in hydrogen bonding were delineated with a 1H,15N HSQC spectrum of 15N labeled sample protein immediately after dissolving in 99.996% D2O.
Initial structures of L14e were derived using ARIA 1.2 interfaced to CNS (39) with a random chain starting structure. Initial restraints consisted of hydrogen bond and dihedral angle restraints obtained from chemical shift indices, HNHA coupling constants, NH residual dipolar couplings (RDCs), a set of unambiguous backbone NOEs obtained from a 15N-edited NOESY spectrum, and sets of unassigned NOEs from 15N-edited and 13C-edited NOESY spectra. Initial values for the dipolar coupling (Da) and rhombicity (R) were determined from a powder distribution of RDC's (40). In each ARIA iteration, distance restraints were calibrated and NOE assignments of the NOESY crosspeaks were made by ARIA based on an ensemble of the lowest energy structures from the previous iteration. The top 5 ARIA structures were used to calculate new values for Da and R using Module (41), the structures averaged and used as a starting structure for Xplor-NIH refinement. Restraints at this stage consisted of the ARIA-determined distance restraints, CSI-derived torsional angles, HNHA couplings, and NH RDCs. Hydrogen bonding partners of the non-exchangable NH's were identified using the top 10 out of 100 Xplor-NIH structures and used for subsequent refinements. T1 and T2 relaxation restraints that did not fall in suspected mobile regions, which were identified by significantly low [1H]-15N NOEs, were included in subsequent refinements, and another round of Xplor-NIH (from random starting structures) was performed in which the anisotropy parameters Da, D‖/D, R, and τc were also refined as described below. The best 10 (out of 100) structures were averaged and used as a starting structure for a final round of Xplor-NIH refinement in which Da, D‖/D, R, and τc were fixed to the values found in the previous iteration. This final refinement also included a disulfide bond between C10 and C22 and contributions from the Xplor-NIH Ramachandran force field, but omitted the Aria ambiguous restraints. The final restraints and the statistics on the ensemble of the best 10 (out of 100) structures are summarized in Table 1. The model with the lowest backbone RMSD to the average structure was taken as representative of the ensemble.
Xplor-NIH internal dynamics were used with an initial equilibration of 3000 K for random starting sturcture and 1000 K for refinements starting from a defined structure, with slow cooling to 100 K with 50 K temperature decrements, followed by energy minimization. During both the equilibration and cooling phases, distance restraint force constants were increased from 5 to 50 kcal/mol, and force constants for dipolar coupling and T1/T2 restraints were increased during the cooling phase from 0.01 to 1.0, and 0.5 to 5.0, respectively. To find optimal values of Da, D‖/D, R (rhombicity), and the overall rotational correlation time τc, a grid search with rigid body rotation was implemented at each temperature change during the simulated annealing. The final values characterizing the molecular alignment tensor were a Da of -8.7 Hz and a rhombicity of 0.29. The rotational diffusion anisotropy defined by the T1/T2 restraints converged to a D‖/D of 0.95 with a rhombicity of 0.0 and an overall τc of 5.2 ns. The quality of the final structures were analyzed using PROCHECK-NMR. The coordinates and resonance assignments have been deposited in the Protein Data Bank (PDB ID code 2kds).
[1H]- 15N NOE and 15N R1 and R2 relaxation data were analyzed using TENSOR2 kindly provided by M. Blackledge (http://www.ibs.fr/ext/labos/LRMN/softs/). The diffusion tensor elements were calculated using data for NHs only in α or β secondary structures with NOEs greater than 0.7, and R2/R1 ratios within 1 standard deviation of the mean. The principal elements of the diffusion tensor elements were as follows: Dx = 2.8 × 107s-1, Dy = 3.5 × 107 s-1, and Dz = 3.5 × 107 s-1. Similar results were obtained using the quadric_diffusion program kindly provided by A. Palmer (Columbia University). Order parameters characterizing backbone dynamics were obtained with TENSOR2 by fitting the 15N relaxation data according to the “model-free” approach of Lipari and Szabo and anisotropic rotation defined by the above tensor elements.
The S. solfataricus L14e gene was located in the genome database, cloned using PCR, and expressed in E. coli using standard procedures as described in Materials and Methods. SDS-PAGE gel electrophoresis of the recombinant protein indicated a molecular mass of approximately 12 kDa, and gel exclusion chromatography indicated a slightly larger mass of 13 kDa. Both are somewhat larger than expected from the amino acid sequence: i.e. 10861 Da. Mass spectrometry demonstrated a molecular weight of 10727, consistent with removal of the N-terminal methionine by methionine aminopeptidase in E. coli. Circular dichroism spectra of purified protein indicated that only about half of the protein was composed of regular secondary structure with 6 (±2) % α-helix, 34 (±9) % β-sheet, 9 (±4) % turns, and 53 (±1) % random coil and other structures.
The 1H-15N HSQC spectrum at 18.7 T was well resolved with 91 cross peaks, consistent with 95 residues and no contribution from 4 prolines, with one at the N-terminus (Figure 2). Backbone and side chain assignments were obtained using triple resonance experiments as described in Materials and Methods. Except for the C-terminal leucine, all backbone atoms and 96% of non-exchangeable side chain protons were assigned. Chemical shift indices indicated β-sheet secondary structure extending over residues 9-15, 20-23, 25-27, 31-36, 46-49, and 59-64, and α-helix over residues 69-79 and 82-86.
NOESY cross peak assignments were obtained using ARIA 1.2, resulting in 283 ambiguous and 1262 unambiguous NOESY cross peak assignments. Structures were obtained with Xplor-NIH using NOE distance constraints, chemical shift indices, hydrogen bonds, backbone scalar coupling constants, residual dipolar couplings, and 15N relaxation rates. Table 1 summarizes the restraints used along with statistics for the ensemble of the 10 lowest energy structures, shown in Figure 3A. The quality of the fit was excellent, with a 0.55 Å root mean square deviation (RMSD) for the backbone atoms in well-defined regions using the ten best structures, and an RMSD of 1.90 Å over the entire backbone. The quality of the structure was analyzed using PROCHECK (Table 1). A ribbon drawing of a representative structure (that with the lowest root mean square deviation relative to the average structure) is shown in Figure 3B.
Consistent with the circular dichroism spectrum, the NMR solution structure of L14e is largely β-sheet, with a little over half of the protein (residues 1-57) composed of a barrel core formed by 5 consecutive β-strands. The turn between the first two strands is an open loop of 8 residues with little or no hydrogen bonding. The second strand is 10 residues long and contains a β-bulge which serves to accentuate left-handed twisting of the sheet. Turns 2, 3, and 4 are tight turns, and the bond angles for the three are consistent with type I turns. A 310 helix bridges strand 4 and 5, allowing strand 5 to wrap around the core and complete the barrel with 2 hydrogen bonds with strand 1. Interestingly, proline 56 lies in the center of strand 5 and is well conserved throughout the archaeal L14e sequences (Figure 1). L14e contains two buried cysteines that lie adjacent to each other in the hydrophobic core on the first (C10) and second (C22) β-strands. The orientation of the side chains in the refined structure are consistent with a disulfide in the native fold, and both cysteines are well conserved in the archaeal L14e proteins (Figure 1).
The C-terminal portion of L14e (beginning at D58) contains a largely extended section which loops across the N-terminus of strand 1, resulting in an apparent knot. The first 10 residues stretch from the end of the last β-strand across strands 1, 2, and 3 to effectively support the β-sheet. Residues 69-79 form a helical region that consists mainly of α–helix with a tendency for 310 helix formation towards the C-terminal end of the helix. The helix and following extended chain contains a hydrophobic face composed largely of V72, L76, L81, M85, and I89 which pack against a hydrophobic patch on strands 1, 2, and 3 of the β-barrel. The importance of this interaction is reinforced by the conservation of these hydrophobic residues in the other arcaheal L14e homologs (Figure 1). The packing of the C-terminal chain onto the surface of the sheet essentially doubles the size of the hydrophobic core of the protein.
The backbone RMSD of the ensemble of NMR structures (Figure 4) indicate regions of well-defined structure (e.g., β-strands) punctuated by regions of increased RMSD values corresponding to both N- and C-terminal sequences, turns 1, 2, and 3, and the end of the last β-strand. Turn 4 consists of 310 helix and is well defined with very little structural heterogeneity in the ensemble. 15N-1H residual dipolar coupling measurements exhibited pronounced sequence dependence (Figure 4) due to varying orientations of N-H bond vectors around the β-barrel. These were well fit by the structure refinement.
The sequence dependence of the 15N order parameters (S2) shows that increased RMSD's in some regions should not be attributed to flexibility, but are likely due to fewer constraints (Figure 4). S2 values range from 0.85 to 0.90 over much of the protein. In addition to the terminal regions, there were two regions with enhanced flexibility compared to the main chain. These occur around the β-bulge at turn 2 (residues 27-30), and in the extended chain (residues 62-65) which stretches across β-strands 1 and 2. 15N T1/T2 relaxation rates showed little sequence variation (data not shown) and were consistent with an axially symmetric structure, as were the results from the rotational diffusion anisotropy analysis of the relaxation data. The rotational correlation time of 5.2 ns obtained from refinement using the 15N relaxation data is consistent with a monomer with a molecular mass of 10.8 kD (i.e. τc(ns) MW(kDa)/2).
As expected for a thermophile protein, L14e demonstrated significant thermal stability with a Tm of 102.8 °C (ΔHvh = 145 kcal/mol) at pH 7 and a protein concentration of 0.8 mg/ml (Figure 5). Thermal unfolding was reversible at pH 7 as judged by the reproducibility of DSC scans to the Tm. Decreasing the pH to 5 and below led to baseline artifacts and irreversible unfolding, making it impossible to obtain an estimate of the ΔCp of unfolding using the Kirchhoff relation. Surprisingly, the Tm was slightly dependent on protein concentration, with the Tm decreasing with decreasing protein concentration (e.g. 100.8 at 0.4 mg/ml, and 100.2 °C at 0.2 mg/ml). Such behavior is indicative of the presence of higher order oligomers. This is consistent with the observed ΔHcal/ΔHvh ratio of about 0.3, and also gel exclusion chromatography which indicated a molecular mass of 13 (± 0.3) kD (data not shown), somewhat larger than the expected 10.8 kD. However, as indicated above, 15N relaxation data demonstrated that L14e was monomeric at 35 °C with a τc of 5.2 nsec. In addition, no significant change in 1H linewidth or 1H,15N HSQC peak positions was observed with increasing temperature to 80 °C. This would indicate that the primary species was monomeric up to at least 80 °C. We interpret this data to indicate that L14e exists predominantly as a monomer, with a tendency to form higher order oligomers with increasing temperature, especially near 100 °C.
To investigate a possible contribution of a C10-C27 disulfide to thermal stability, single (C27A) and double (C10A/C27A) cysteine to alanine mutations were created. CD spectra demonstrated that the alanine substitutions resulted in negligible change in structure. In addition, 1H,15N HSQC spectra of the mutant proteins showed minor differences except at the mutated residues, indicating that the structures of the proteins were not significantly altered by the substitutions. DSC scans of the mutant proteins were reversible at pH 7, and irreversible below pH 5, similar to the native protein. The single C22A mutant showed a decrease in Tm to 90.6 °C and a ΔHvh of 97.2 kcal/mol (0.6 mg/ml). The thermodynamics of unfolding the double mutant C10A/C27A showed little additional change, with a Tm of 90.8 °C (pH 7, 0.4 mg/ml) and a ΔHvh of 98.7 kcal/mol. The DSC data for both could be fit well with a ΔCp of 1000 cal/deg/mol.
The structure of L14e is the first structure of any member of the L14e family to be reported in the literature. To our knowledge, it is also the first structure described of a member of the COG2163 group, which includes L6e and L27e. A BLAST search of non-redundant (NCBI) genomic protein sequences using the protein sequence for S. acidocaldarius L14e (YP_255473) resulted in more than 66 homologous sequences (alignment scores greater than 50). Most of these have been annotated as members of the L14e family of proteins. Sequence comparisons indicate that the structures of the N-terminal domains of these proteins should be similar to L14e, i.e. a classic SH3 (Src-homology region 3) domain fold with a β-barrel composed of a three-stranded anti-parallel β-sheet packed orthogonally across a second three-stranded β-sheet which shares one strand (42-44). Many of these proteins contain C-terminal sequences with significant differences in length and expected domain folds. For example, the human L14e C-terminal sequence contains a bZIP domain along with a nuclear targeting sequence.
While this work was in progress, the coordinates for a lower resolution structure of L14e were deposited in the protein data base (2JOY) based on a smaller NMR data set (viz. NOE distance constraints and dihedral angles). The overall folds of the two structures are similar with a backbone RMSD of 3.8 Å (Figure 6). Comparison of the SH3 domains shows the largest deviation in the loop between the first and second strands. The 15N relaxation data presented here indicate that this region is no more flexible than the adjacent strands. The difference is not likely to be due to flexibility, but rather differences in the number of NMR constraints defining the region. The use of RDC's in the structure refinement presented here significantly increases the accuracy of the refinement. This is true also in the C-terminal region beyond T57, i.e. beyond the SH3 fold, where the most significant difference is in the length of the helical regions.
SH3 domains have been extensively characterized because of their importance in protein-protein interactions (45, 46) with representative examples including Src (47), Csk (48), Lck (49), Fyn (50) and Abl (50) tyrosine kinase domains, α-spectrin (43), and the adapter protein SEM-5 (51). An SH3 fold has been observed in a number of other ribosomal proteins, e.g. L2 in Bacillus stearothermophilus (52), and L19 in Thermus thermophilus (6). Sequence alignments of L14e with L2 and L19 demonstrate regions of similarities that are most likely important in defining the SH3 fold. Notably, similar sequence similarities were not observed for other SH3 domain proteins such as Src, Csk, and Fyn.
SH3 domains typically recognize short (3-9 residue) proline rich sequences with moderate affinity (e.g. a 0.001 M Kd) (46). The peptide binding site is usually a surface hydrophobic pocket defined by three loops: the RT-loop (between the first and second β-strands), the nSrc loop (between strands 2 and 3), and a 310 helix (between strands 4 and 5). Interestingly, L14e contains a hydrophobic surface region defined by an nSrc loop and a 310 helix. However, the RT-loop normally found in SH3 protein binding sites is absent and a tight turn is observed instead (Figure 7). The lack of an RT-loop would appear to make it difficult to define a surface with the appropriate pockets for a specific peptide ligand. However, this may not be necessary in the ribosome where multiple interactions could lead to the necessary specificity.
The stabilities of SH3 domains have been the focus of a number of studies (Fyn (53), Abl (54), α-spectrin (55), btk (56), itk (56), tec (56), drk (57), and src (58)) which have demonstrated that the fold can achieve a wide range of stability. For example, drk SH3 is mariginally stable in water with significant population of the unfolded state at 25 °C, while most SH3 domains have Tm values near 70 °C. There have been few studies characterizing the stability of an SH3 fold in a thermophile protein. Sac7d and Sso7d adopt a fold that is similar to the SH3 domain, but labeling these as SH3 proteins is probably inaccurate due to the absence of a fifth β-strand which packs against strand 1 (Figure 7). Unfortunately, a rigorous characterization of the thermodynamics of unfolding L14e is not possible due to the inability to obtain an accurate ΔCp. Using the Tm as a measure of stability indicates that the Sulfolobus L14e is significantly more stable than any SH3 studied to date. The increased stability can be attributed in part to the presence of an internal disulfide. In addition, the C-terminal helix packs against a hydrophobic patch on the surface of the SH3 domain to significantly increase the size of the hydrophobic core with I63, A67, V72, L76, L81, and I89, packing against I9, V23, V25, L34, and T36. The C-terminal “meandering” sequence significantly augments the structure and more than doubles the hydrophobic core of the SH3 fold (composed of V6, C10, C 22, I24, I27, V33, V35, V44, V49, L54).
The L14e sequence from R8 to D30 shows similarity to the KOW motif, a putative RNA-binding sequence identified in the NusG bacterial transcription factor and also ribosomal protein families L24p, L26e, and L27e (59). The KOW motif is 27 residues long and appears in the NusG domain III from E196 to K222, which encompasses the first and second β-strands of an SH3 fold (60). Similarly, the R8-D30 sequence of L14e occurs in the first two β-strands of the SH3 domain. The RNA binding interactions in NusG have been attributed to electrostatic interactions with positively charged residues in the loop regions preceding, between, and following the two strands based on the L24p structure in the bacterial ribosome (60). Whether or not these can be generalized to other KOW sequences is not clear. The KOW sequence does not in general present a positively charged surface (61), although it has been pointed out that RNA interactions can occur with negatively charged protein surfaces that would normally be thought to prevent binding (60). Many conserved KOW residues (e.g. the central glycines) are more likely involved in defining the SH3 fold, than in making interactions with nucleic acid. A large portion of the outer face of the two strands in the KOW sequence in L14e is not only positively charged but contains significant hydrophobic character as well (V6, I9, V11, V23, V27, I29). This face is largely occluded by association with the C-terminal sequence. Of course the C-terminus region may adopt a different structure in complex with the ribosome to permit the KOW strands to interact with RNA. Interestingly, sequential homology between L14e and L24e indicates that L24e could also adopt an SH3 fold with additional protein sequence beyond the final β-strand of the SH3 fold.
The L14e sequence alignments shown in Figure 1 demonstrate a remarkably well-conserved stretch of four positively charged residues in β-strand 4 of the archaeal proteins (45KRRR48 in S. solfataricus) (Figure 8). This is a classic NLS sequence which has been demonstrated to be sufficient to drive protein localization to the nucleus in eukaryotes (62-66). Since archaea do not have nuclei, this cannot be the role of this sequence here. NLS motifs have been noted in ribosomal proteins (e.g. S12p and S17p) of bacteria, which also lack nuclei (66). In these cases it has been argued that these “pre-exisiting” basic motifs may have been utilized by early eukaryotic cells for NLS sequences. The fact that the sequence is not conserved in the eukaryote examples of L14e would argue that this 4 residue basic motif did not serve as an evolutionary precursor to an NLS in L14e. Rather it seems likely that the role is to form a highly positively charged surface on L14e that is important in defining the structure of the ribosome.
As noted in the introduction, the amino terminus of L14e is nearly identical to that of the Sac8b protein described by Dijk et al. (23). Using the genome sequences for S. acidocaldarius (67), S. tokodaii (68), and S. solfataricus (27), we have searched for Sulfolobus sequences coding for 8 kD proteins with N-terminal sequences similar to the sequence reported by Dijk et al. (23). Surprisingly, we find none. In fact, the most likely candidates are L14e proteins: YP_255473 in S. acidocaldarius, NP_376274.1 in S. tokodaii, and NP_341932 in S. solfataricus, all of which have N-terminal sequences homologous to the 17-residue sequence reported by Dijk et al. (23) except for the initiating methionine (Figure 1). L14e from S.acidocaldarius shows four differences that could be due to either strain differences or errors in sequencing. Sac8b was reported to be present in low amounts, and the possibility of sequence errors due to the presence of other proteins cannot be assessed. We note that the lower than expected molecular weight reported for Sac8b is apparently real, since recombinant L14e does not migrate on SDS gels between Sac7d and Sac10a as reported for Sac8b (23). The structure indicates that proteolysis in the vicinity of the solvent exposed C-terminal helix (e.g. at K73) would result in a protein with an approximate molecular mass of 8 kD. Whether or not this is the case will require more detailed characterization of native Sac8b from S. acidocaldarius.
This work was supported by grant GM49686 from the National Institutes of Health to JWS and SPE. Atomic coordinates and NMR data have been deposited in PDB (2kds) and RCSB (100995).