|Home | About | Journals | Submit | Contact Us | Français|
The crystal structure for cce_0566 (171 aa, 19.4 kDa), a DUF269 annotated protein from the diazotrophic cyanobacterium Cyanothece sp. ATCC 51142, was determined to 1.60 Å resolution. Cce_0566 is a homodimer with each molecule composed of eight α-helices folded on one side of a three strand anti-parallel β-sheet. Hydrophobic interactions between the side chains of largely conserved residues on the surface of each β-sheet hold the dimer together. The fold observed for cce_0566 may be unique to proteins in the DUF269 family, hence, the protein may also have a function unique to nitrogen fixation. A solvent accessible cleft containing conserved charged residues near the dimer interface could represent the active site or ligand-binding surface for the protein’s biological function.
The conversion of one molecule of dinitrogen (N2) to two molecules of bioavailable ammonia (NH3) is a key component of the global biogeochemical nitrogen cycle . Excluding made–made nitrogen fixation via the industrial Haber–Bosch reaction , 99% of the total N2 fixed per year occurs via biological processes performed exclusively by diazotrophic microorganisms . The enzyme responsible for catalyzing the chemically difficult reduction of the N2 triple bond under ambient conditions is a multimeric metal-bound complex, nitrogenase [1, 3, 4]. Four different types of nitrogenases have been identified that differ by the metal composition at the active site . Despite identification of the proteins that compose the nitrogenase complex, the mechanism of enzymatic nitrogen fixation is still the subject of much research . Indeed, interest in fully understanding the biological nitrogen fixation process may be at an all time high because a byproduct of nitrogen fixation is dihydrogen (H2) , an appealing source of renewable and green energy .
The only living organisms capable of fixing nitrogen are a select number of bacterial and archaea genera and include species of cyanobacteria, phototrophic organisms that were the progenitors of chloroplasts in plants and algae [8, 9]. Through the release of the photosynthesis byproduct, oxygen, cyanobacteria radically altered the composition of early earth’s reducing atmosphere into an oxidizing atmosphere 2.3 billion years ago [10, 11]. However, nitrogen fixation is biochemically intolerant of oxygen because nitrogenase is rapidly and permanently inactivated by it [12–14]. To aid in the transition from an anerobic to aerobic environment some cyanobacteria evolved metabolic and regulatory processes to perform both photosynthesis and nitrogen fixation . In the marine diazotrophic cyanobacterium Cyanothece sp. ATCC 51142 this was achieved by the temporal separation of photosynthesis and nitrogen fixation into daytime and nighttime activities, respectively, making cyanobacteria the simplest known organisms to display circadian rhythms [16, 17].
The genome of Cyanothece 51142 was recently sequenced and deposited into the GenBank database (Accession Nos. CP000806–CP000811) . Present in the conserved syntenic nif cluster of 34 genes is cce_0566 that falls into a family of proteins with a “Domain of Unknown Function” annotated DUF269 (PF03270). To date, the DUF269 gene has only been observed in nitrogen-fixing species  suggesting that it plays a role, yet undefined, in the biosynthesis, assembly, transport, and insertion of components of the nitrogenase complex. The cce_0566 gene is located between nifX (cce_0565) and a protein that falls into the DUF683 family (cce_0567) [20, 21]. To obtain clues as to the biochemical function of proteins in the DUF269 family of conserved proteins [22, 23] the crystal structure for cce_0566 was determined to a resolution of 1.60 Å.
The cce_0566 gene from Cyanothece sp. ATCC 51142, minus the 18-residue, N-terminal signal sequence (MFIDNGNALIVIVIMTTT), was synthesized between NdeI and BamHI restriction endonuclease sites (BIOS&T, Montreal, Canada) and inserted into the expression vector pET28b (Novagen, Madison, WI). The 19.3 kDa gene product contained a 21-residue tag (MGSSHHHHHHSSGLVPRGSHM) prior to the first native residue (T19) that included a poly-histidine stretch to assist protein purification by metal-affinity chromatography. The recombinant plasmid was transformed into Escherichia coli BL21(DE3) cells (Novagen, Madison, WI) using a heat shock method. Nitrogen-15 substituted protein was prepared using standard minimal media protocols  with 15NH4Cl and isopropyl β-d-1-thiogalactopyranoside induction at an OD600 of ~0.8 at 25 °C. Expressed protein from 750 mL of growth media was processed sequentially in two steps involving Ni–NTA (Qiagen, Valencia, CA) affinity chromatography followed by size exclusion chromatography on a Superdex75 HiLoad 26/60 column (Amersham Pharmacia Biotech, Piscataway, NJ)  that simultaneously exchanged cce_0566 into the buffer used for the crystallization, CD, and NMR experiments (500 mM NaCl, 20 mM TrisHCl, 1.0 mM dithiothreitol, pH 7.1).
An overall rotational correlation time (τc) for an 15N-labeled cce_0566 sample (~1 mM) was rapidly estimated from backbone amide 15N T1/T1ρ ratios measured using a modified 1H–15N HSQC experiment to record a 15N-edited one-dimensional spectrum . Data was collected at 25 °C on a Varian Inova-600 spectrometer equipped with a triple resonance cyroprobe and pulse field gradients.
Circular dichroism data was collected on an Aviv Model 410 spectropolarimeter (Lakewood, NJ) calibrated with an aqueous solution of ammonium d-(+)camphorsulfonate using a 13 µM cce_0566 sample in NMR buffer and in a quartz cell of 0.1 cm path length. A thermal denaturation curve was obtained by recording and plotting the ellipticity at 215 nm in 2.0 °C intervals from 5 to 80 °C. Steady-state wavelength spectra for cce_0566 were recorded in 0.5 nm increments between 200 and 250 nm at 25 °C. Each wavelength spectrum was the result of averaging two consecutive scans with a bandwidth of 1.0 nm and a time constant of 1.0 s. Steady-state wavelength spectra were processed by subtracting a blank spectrum from the protein spectrum and then automatically line smoothing the data using Aviv software.
Crystallization conditions for 15N-labeled cce_0566 were identified using screens from Hampton Research (Aliso Viejo, CA) and the hanging-drop, vapor-diffusion method set-up at room temperature (~22 °C). Crystals began appearing 24–48 h later under one condition that was then optimized by adjusting the protein concentration and crystallization protocols. The data used to determine the structure for cce_0566 were obtained from plate-like crystals grown by microbatch under paraffin oil . Crystals were harvested approximately two weeks after mixing 1.5 µL of protein (~2 mg/mL) with 1.5 µL of reservoir buffer containing 0.2 M ammonium acetate, 0.1 M sodium acetate trihydrate, and 30% (w/v) polyethylene glycol 4000, pH 4.6. Crystals were directly mounted in nylon CryoLoops (Hampton Research), flash-frozen in liquid nitrogen, stored under liquid nitrogen, and shipped to the National Synchrotron Light Source (NSLS) at Brookhaven National Laboratory for X-ray data collection.
Native X-ray diffraction data were collected at the X29A beamline with an ADSC Q315r CCD detector. A solution was determined by molecular replacement using MOLREP  from the CCP4 suite  and the crystal structure from a related protein, B7JA91_ACIF2 (YP_002425942.1) from Acidithiobacillus ferrooxidans (PDB ID 3G7P). The final model was generated after numerous iterative rounds of refinement in REFMAC (v 5.5.0109) . A final check on the stereochemical quality of the final model was assessed using the program MolProbity  and PROCHECK  and any conflicts addressed. The data collection and structure refinement statistics are given in Supplementary Table S1 and the coordinates deposited in the Protein Data Bank (PDB ID 3NJ2). The amino acid sequence for the coordinates of cce_0566 deposited in the RCSBPDB is numbered sequentially, G1–L174, beginning with the 21-residue, non-native tag. Here, the protein is numbered following the order in the native sequence, and hence, the first native residue, T22 in the RCSB-PDB, is labeled T20.
The first indication that DUF269 formed a dimer in solution was its elution time off a size exclusion column consistent with an ~36 kDa protein (data not shown). This was corroborated by an estimated rotational correlation time (τc) of 18.1 ± 2.7 ns (25 °C) that was more consistent with an ~36 kDa dimer than an ~18 kDa monomer . Further evidence for dimer formation was the observation of broad line shapes, untypical for a < 20 kDa protein, for most of the amide cross peaks in the 1H–15N HSQC spectrum of cce_0566 (data not shown). Consequently, the dimer observed in the asymmetric unit of the crystal structure of cce_0566, shown in Fig. 1, likely represents the species present in solution. The buried surface area due to the formation of such a dimer, 1505 Å2, is 8.3% of the solvent-accessible surface and adequate to stabilize this species . There is some asymmetry between the two molecules in the dimer, as molecule A and B in the asymmetric unit have a backbone RMSD of 0.73 Å and an all atom RMSD of 1.73 Å over residues S21–L171 (superposition server SuperPose, v 1.0) . The variations are most prominent in the loops and turns in the structure as the backbone and all atom RMSDs reduce to 0.51 and 1.36 Å, respectively, when only the α-helical and β-strand regions are used for the alignment.
With dimensions of approximately 72 × 42 × 32 Å, the cce_0566 dimer shown in Fig. 1 has a roughly elliptical shape. Fig. 2A and B are cartoon representations of a single cce_0566 molecule shown in two different orientations with the representation on the right (B) rotated ~90° relative to the orientation on the left (A). Fig. 2C is a secondary structure diagram that details the residues in each secondary structure element and approximates the relative orientation of the secondary structure elements to each other. Each monomer is composed of eight α-helices folded on one side of a three strand anti-parallel β-sheet. The first three helices form the narrow ends of the ellipse while the two longest helices, α5 and α 7, cross over each other to form the framework for the wide central part of the eclipse. As highlighted in Fig. 3, the strands of each β-sheet cross over at a ~90° angle to form a “checkered” lattice at the dimer interface. Most of the side chains of the residues in the β-sheet are hydrophobic and hydrophobic contacts between the two β-sheets likely play a significant role in holding the dimer together. Indeed, this feature may be universal in all DUF269 proteins as sequence alignment of the most dissimilar proteins in the DUF269 family, shown in Fig. 4, illustrates that these β-strands are composed largely of conserved hydrophobic residues.
Fig. 5 shows the solvent accessible surface on the cce_0566 dimer labeled by electrostatic surface potential (Fig. 5A) and conserved regions identified by ConSurf analysis  (Fig. 5B). While there are many large, well defined regions of positive and negative charge on the protein’s surface, they do not correspond to highly conserved regions, suggesting that much of the surface electrostatic potential is not biologically significant. On the other hand, the solvent accessible surface rendering shows a cleft (solid black arrow) composed of highly conserved residues near the dimer interface. Furthermore, some of conserved surface in this cleft is charged. The sequence alignment in Fig. 4 indicates that these conserved regions are principally α6 plus adjacent residues and the turn between β1 and β2, regions that are physically near each other (Figs. 2 and and3).3). Included in this conserved region are the charged residues E114, K125, R134, D135, and R138. Such conservation of charged residues in a solvent accessible cleft may represent the active site or ligand-binding surface of cce_0566.
Fig. S1A is the steady-state CD spectrum for cce_0566 collected at 25 °C. The spectrum is dominated by a double minimum at approximately 224 and 208 nm and an extrapolated maximum around 195 nm, features characteristic of α-helical secondary structure [35, 36]. Such an observation is consistent with the amount of helical structure (57%) observed in the crystal structure of the protein. Note that the double minimum is skewed and more intense around 208 nm, likely due to the contribution of other elements of secondary structure to the CD steady-state spectrum.
The thermal stability of cce_0566 was measured by monitoring the ellipicity at a specific wavelength as a function of temperature . As shown in Fig. S1B, a gradual increase in ellipticity at 215 nm is observed for cce_0566 up to ~60 °C followed by a rapid increase in ellipticity that plateaus slightly above ~70 °C. Visual inspection of the sample after heating to 80 °C showed evidence for precipitation, indicating the heat induced protein unfolding was irreversible. Consequently, the CD data for cce_0566 may not be analyzed thermodynamically , however, a quantitative estimation of the Tm for this transition may still be obtained by assuming a two-state model and taking a first derivative of the curve in Fig. S1B . The maximum of this first derivative, shown in Fig. S1C, is 70 °C.
The primary reason that cce_0566 is believed to play a role in nitrogen fixation is circumstantial evidence: the gene is located in an operon of genes involved in the biochemical process. Further circumstantial evidence that cce_0566 may play a biochemical role in nitrogen fixation comes from transcriptomics microarray data for Cyanothece 51142 cultures grown as previously described  for 48 h in alternating 12 h periods of light and dark . Fig. 6 shows the results of global transcriptomic data (European Bioinformatics Institute AssayExpress database – accession numbers AMEXP-864 and E-TABM-337) for cce_0566, cce_0567 21], and three essential nitrogen fixation genes, nifB, nifH, and nifN 40]. The expression profile for cce_0566 oscillates in sync with its neighbor, cce_0567, and other known, essential nif genes. Hence, the transcriptomics evidence places cce_0566 one step closer to a potential biological function, from gene location in the nitrogen fixation operon to physical expression at the appropriate time in the cell cycle. Indeed, in Cyanothece 51142 it was observed that all 34 genes in the nitrogen fixation transcriptional regulon exhibited strong co-regulation in their expression . It is possible that the cce_0566 gene product is now non-essential and it is only expressed because it still happens to exist within the essential genes in the nitrogen fixation transcriptional regulon. However, the functions of 16 of the 34 genes in the nitrogen fixation transcriptional regulon, including cce_0566 (DUF269), are not known. If unknown function translates into non-essential in the nif operon, it seems illogical that the cell would waste resources expressing cce_0566 along with so many other non-essential genes.
To identify a possible biological function for cce_0566 and the other proteins in the DUF269 family, the RCSB Protein Data Bank was searched for structures with similarities to cce_0566 using the DALI search engine . The best match, with a Z-score of 22.4, was the structure used as the molecular replacement model to determine the structure for cce_0566, the DUF269 annotated protein from A. ferrooxidans (3G7P). The amino acid sequences of the two proteins are 41% identical and 61% conserved. The two structures are similar with a backbone RMSD of 2.47 Å as shown in their superposition illustrated in Fig. 7 (SuperPose, v 1.0) . Indeed, the major difference is an additional α-helix, α1, at the N-terminus of cce_0566. Aside from the DUF269 protein from A. ferrooxidans, the best Z-scores above 4.0 identified by the search were three dissimilar proteins: Eco RI endonuclease (4.6), F-actin-capping protein subunit alpha-1 (4.4), and prolyl-tRNA synthetase (4.1). Hence, the fold adopted by the DUF269 proteins appears to be unique to this family of proteins. Given that the DUF269 gene is only found in the nitrogen fixation transcriptional regulon present in a select number of diazotrophic genera, the DUF269 gene product may have a specialized function in nitrogen fixation, and hence, warrant further biochemical studies to identify its precise biological role in the nitrogen fixation process.
This work was initiated as part of a Membrane Biology EMSL Scientific Grand Challenge project at the W.R. Wiley Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by U.S. Department of Energy’s Office of Biological and Environmental Research (BER) program located at Pacific Northwest National Laboratory (PNNL). Battelle operates PNNL for the U.S. Department of Energy. The PNNL Laboratory Directed Research Development (LDRD) program assisted completion of this research. The assistance of the X29A beam line scientists at the National Synchrotron Light Source at Brookhaven National Laboratory is appreciated. Support for beamline X29A at the National Synchrotron Light Source comes principally from the Offices of Biological and Environmental Research and of Basic Energy Sciences of the US Department of Energy, and from the National Center for Research Resources of the National Institutes of Health.
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.febslet.2012.01.037.