|Home | About | Journals | Submit | Contact Us | Français|
The genus Vibrio consists of gram-negative, motile, rod-like bacterium capable of both respiratory and fermentative metabolism.1 Commonly found in the surface waters of the world, Vibrios are found in both marine and freshwater sources. V. cholerae is an important human pathogen transmitted by water or food that affects the small intestine through enterotoxin secretion. Although cholera has been virtually eliminated in the U.S. due to modern water and sewage treatment systems, it still remains a serious health threat in developing countries with poor sanitation and limited health care. The complete genome sequence of V. cholerae pathogenic strain N16961 has been determined to consist of two circular chromosomes with 3890 open reading frames (ORFs).2 Genes that perform essential cell functions such as transcription, translation, cell-wall biosynthesis, DNA replication, and pathogenicity are located on the larger chromosome. The genomic sequence of V. cholerae provides an important starting point in understanding how this widespread environmental microorganism remains a persistent human pathogen.2 Applying a structural genomics survey to this genome can provide new potential targets for drug and therapeutics development. The posttranslational Nα-terminal acetylation of proteins is very common in eukaryotes, but so far has been only reported for few bacterial proteins.3 In this article, we report the crystal structure of VC1889 protein, a putative acetyltransferase from V. cholerae, that was determined at 1.7 Å resolution and discuss its potential function.
The ORF of this protein from V. cholerae was amplified from genomic DNA with KOD DNA polymerase using conditions and reagents provided by the vendor (Novagen, Madison, WI). The gene was cloned into the pMCSG7 vector4 using a modified ligation independent cloning protocol.5 This process generated an expression clone producing a fusion protein with an N-terminal His6 tag and a TEV protease recognition site (ENLYFQ↓S). The fusion protein was over-produced in an E. coli BL21-derivative harboring a plasmid encoding three rare E. coli tRNAs: Arg [AGG/AGA] and Ile [ATA]. A selenomethionine (SeMet) derivative of the expressed protein was prepared as described by Walsh et al.6 The protein was purified from a resuspension of IPTG-induced bacterial cells in binding buffer (500 mM NaCl, 5% glycerol, 50 mM HEPES, pH 8.0, 10 mM imidazole, and 10 mM β-mercaptoethanol). The cells were lysed by the addition of lysozyme at 1 mg/mL in the presence of a protease inhibitor cocktail (Sigma P8849) (0.25 mL/5 g cells) and sonicated on ice for 3 min. After clarification by centrifugation (30 min at 30,000g) and passage through a 0.2 µm filter, the lysate was applied to Ni-HiTrap Sepharose HP resin (Amersham) and unbound proteins were removed by washing with 10 volumes of binding buffer. The fusion protein was eluted from the column with 250 mM imidazole in buffer (50 mM HEPES, 500 mM NaCl, 5% glycerol, 10 mM β-mercaptoethanol). The fusion tag was then cleaved with recombinant His6-tagged TEV protease.7 The cleaved protein was purified from the His6-tag, undigested protein and His6-tagged TEV protease by application of the solution to a Ni-NTA column (Qiagen). The purified protein was dialyzed in 20 mM HEPES pH 8.0, 250 mM NaCl, 2 mM DTT and concentrated using Centricon Plus-20 (Millipore, Bedford, MA).
The protein was crystallized by vapor diffusion in sitting drops by mixing 1 µL of the protein solution (17 mg/mL) with 1 µL of reservoir solution (0.1M Bis-Tris propane pH 7.0, 4M NaNO3) and equilibrated at 289 K over 500 µL of this solution. Crystals appeared after 24 h. Prior to data collection, crystals were flashfrozen in liquid nitrogen with reservoir solution supplemented with 28% sucrose as a cryoprotectant. The crystals belong to the orthorhombic space group P21212 with unit cell dimensions of a = 53.05 Å, b = 103.07 Å, c = 37.86 Å, and diffracted beyond 1.7 Å Bragg spacing.
Two wavelength anomalous diffraction (MAD) data were collected at 100 K at the 19ID beamline of the Structural Biology Center at the Advanced Photon Source, Argonne National Laboratory. The data were recorded on an ADSC Quantum 315 detector. Peak and inflection point energies were determined from the X-ray absorption spectrum of the SeMet-labeled crystal. All data were collected from a single crystal using inverse beam geometry at the peak and inflection point energies. The crystal was exposed for 3 s per 1.0° rotation of Ω with a crystal to detector distance of 220 mm. Data collection strategy, integration, and scaling were performed with the HKL3000 package.8 A summary of crystallographic information can be found in Table I.
All three expected selenium sites were located, and initial phasing and density modification were calculated with an early version of HKL3000. This largely automated structure solution program package includes SHELXC, SHELXD, SHELXE, MLPHARE, DM, and SOLVE/RESOLVE.9–12 Final phasing and model building were accomplished with the autoSHARP suite13 including ARP/wARP.14 Model completion and adjustments to the structure were accomplished with COOT.15 A total of 174 out of 178 residues were modeled with residues 42–45 missing due to a lack of electron density. The model was subjected to iterations of refinement, manual correction, and solvent addition, achieving a final R factor of 21.1% and R free of 24.4%. TLS and restrained refinement in Refmac516 was performed against peak data to 1.7 Å using Hendrickson-Lattman coefficients from autoSHARP prior to density modification. For refinement the peak data were used with Bijvoet pairs averaged. The stereochemistry of the structure was analyzed with PROCHECK17 and MolProbity.18 All residues were in allowed regions of the MolProbity Ramachandran plot, with 97.62% of these in favored regions. Atomic coordinates and experimental structure factors have been deposited in the Protein Data Bank (PDB) and are accessible under the code 2FCK. Refinement details are provided in Table I.
VC1889 forms a compact globular α/β protein with a twisted S-shaped, 8-stranded β-sheet at its core and five α-helices and three 310 helices arranged in three layers [Fig. 1(a)]. The sheet can be separated into two antiparallel sections: β1–β5 and β6–β8. The sections are joined by hydrogen bonding between parallel strands β5 and β6, which veer apart at their C-terminal ends. Helices η1, α1, η2 and α2 lay above the sheet while helices α3, α4 and α5 are below. Short 310 helix η3 is in the loop between strands β4 and β5. A disordered loop occurs between η2 and α2. Also found in the electron density maps were 12 nitrate ions scattered through the structure and a glycerol molecule. A cavity exists between the β sheet and helices η1, α1, η2 and α2 with several conserved residues lining its site [Fig. 2(a)]. Interestingly, only one side of the cavity is lined with conserved residues while the other side is sequence diverged including potential catalytic Cys139 residue that is not conserved. This may suggest that these proteins use a common cofactor, with a divergent set of substrates.
The structure was submitted to ProFunc,19 a server that performs a number of sequence and structure based analyses, in order to identify putative functions. Sequence comparison provides strong supporting evidence for acetyltransferase activity. A BLAST20 search for this protein sequence resulted in no identical sequences. However, the sequence shares over 50% identity with a protein from V. vulnificus assigned as ribosomal-protein-serine acetyltransferase (Uniprot entry Q8DAW1). A search against the superfamily library of Hidden Markov Models21,22 derived from the SCOP (Structural Classification of Proteins) database23 found one motif matching the sequence. The match occurred between residues 1–4, 6–74, and 77–178 and identifies the characteristic motif of the Acyl-CoA N-acyltransferases (Nat) superfamily (no. 55729). An InterProScan search,24 which attempts to find sequence motifs in a database of protein families, domains, and functional sites, produced five matching sequence motifs to the protein. The matching motifs were: PD451850 and PD338839 (Q9KQV9_VIBCH_Q9KQV9), PF00583 (Acetyltransf_1), SSF55729 (Acyl-CoA N-acyltransferases Nat), and motif G3D.3.40.630.30 (CATH classification: transferase).
Structural analysis using SSM25 revealed an identical fold match to a number of ribosomal Nα-protein acetyl-transferases with good Z-scores (between 6 and 8), although the sequence similarity between structures is less than 25%. A multiple structure superposition identified a putative coenzyme A binding site [Fig. 1(b)]. We have compared the structure of VC1889 with the structure of RimL Nα-acetyltransferase (PDB ID = 1s7n)26 [Fig. 2(b)]. There are some differences in the VC1889 structure including helix α3. This region is a loop in RimL and is involved in cofactor A binding [Fig. 2(c)]. There is also significant shift of the hairpin (β7 and β8) and the loop region (corresponding to residues 160–170) that appears to be close to the active site where the acetyl group attaches to coenzyme A, therefore its exact conformation has bearing on the substrate specificity. It was reported that upon binding of coenzyme A acetyltransferases undergoes some conformational changes, therefore these differences may correspond to differences between apo and liganded forms of the protein. On the other hand, they may also represent an adaptation to a different acetylation target.
The RimL protein is a dimer. VC1889 shares a similar dimer interface with a symmetry mate within the orthorhombic crystal. Whereas the different monomers superpose with an approximate RMSD of 2.1 Å, a 30–40° twist between members of the dimer brings the β4–β5 hairpin into the proposed RimL active site trough26 and raises the overall RMSD to 3.4 Å. An analysis of the VC1889 protein dimer interface with the PQS server,28 however, suggests that this protein is monomeric.
To identify possible ligand specificities, the templatebased approaches of ProFunc were investigated. A number of ligand binding template matches to proteins binding coenzyme A and acetyl coenzyme A were found, but the superposition of the structures is poor. Examination of the reverse template results from ProFunc12 reveals that almost all hits are to acetyltransferase proteins. Of particular interest are the top two matches to the apo form of RimL-ribosomal L7/L12 Nα-protein acetyltransferase (PDB ID = 1s7f and 1s7k). Although RimL shares only 21.3% sequence identity (38% similarity) with VC1889, they have a high structural similarity. The residues used to create a template from the query structure are identified as Tyr100, Trp101, and Ala147. These residues bind the coenzyme A ligand in the Nα-protein acetyltransferases close to the point where the acetyl group is connected [Fig. 2(c)]. Further investigation of the local environment around this template match shows 18 identical and 15 similar residues within 10 Å of the template match giving a local sequence identity of 40%. This is a prime illustration showing that although the global sequence has significantly diverged, the overall fold is retained and the local similarity at the functionally important site remains high. The actual substrate for the acetyltransferase is not known, however, the most likely function is that of a ribosomal Nα-protein acetyltransferase.
We wish to thank all members of the Structural Biology Center at Argonne National Laboratory for their help in conducting the experiments.
Grant sponsor: National Institutes of Health; Grant numbers: GM62414 and GM074942; Grant sponsor: U.S. Department of Energy, Office of Biological and Environmental Research; Grant number: DE-AC02-06CH11357.
The U.S. Government retains for itself, and others acting on its behalf, a paid-up, nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.