A Mycobacteria model for structure-based studies
is essential in M. tuberculosis
validates the encoded enzyme as a chemotherapeutic target. We tried to determine the structure of Mt
IspF to aid in rational ligand design, but the protein, though efficiently produced in recombinant form, was recalcitrant to crystallization. Mt
IspF has 73% amino acid identity to Ms
IspF, so we chose to study the orthologue on the basis that it would provide a suitable model of the pathogen enzyme. The recombinant Ms
IspF is produced in high yield (approximately 30 mg L-1
of bacterial culture), can be purified readily and provided well-ordered single crystals. A surface model of Ms
IspF, which is colored by shared identity with Mt
IspF, highlights the strong resemblance between these sequences, particularly at the active site (Figures , , ). Generally, an accurate homology model is attainable in high sequence identity (>60%) cases [30
] and the use of such models has been successful in structure-based ligand design. In certain cases, even with <60% sequence identity, homology models have been found useful. Examples being human carbonic anhydrase [32
] and Rho kinase [33
] where models were constructed from sequences that shared only 38% and 37% identity, respectively.
Figure 4 IspF homology. Amino acid sequence alignment of MsIspF, MtIspF, and EcIspF. Secondary structure elements of MsIspF are shown above the sequence. β-strands are blue, α-helices red, and 310-helical segments aquamarine. MsIspF and EcIspF (more ...)
Figure 5 The van der Waals surface of the active site colored according to shared sequence identity with MtIspF. Identical residues are colored slate-blue and similar residues are purple. The active site Zn2+ is a grey sphere, and CDP is shown as a stick model (more ...)
Figure 6 Stereo-view of the active site overlay of MsIspF and EcIspF. Residues from MsIspF are labeled and CDP-B is the depicted conformer. Protein atoms are colored: C of MsIspF green, C of EcIspF wheat, all N atoms blue, O atoms red, and Se atoms magenta. The (more ...)
The structure of Ms
IspF bound to CDP was determined to a resolution of 2.2 Å. There are three subunits (chains A, B, and C) in the asymmetric unit, forming a homotrimer about a non-crystallographic axis. The model comprises residues 3–157 for each subunit, with residues 36–37 absent in chains A and B. Structures of several ligand-bound and native forms of IspF from Campylobacter jejuni
, E. coli
, Shewanella oneidensis
, and Thermus thermophilus
are available in the Protein Data Bank [PDB, [15
IspF [PDB code 1GX1, [16
]] was chosen as the model for the structural comparisons to follow because it was built using high-resolution data (1.8 Å) and contains the ligand CDP. The two sequences share 38% identity, and the r.m.s.d. values for the superposition of the Ms
IspF onto the Ec
IspF trimer range from 1.10–1.16 Å, depending upon which chains are aligned.
IspF closely resembles Ec
]. Each subunit displays an α/β fold which contains six β-strands, five α-helices, and two 310
helices (Figures , , ). Four of the strands (β1, β4–6) comprise a central β-sheet that packs against the α- and 310
helices. The other two strands form a short sheet at the end of a loop that extends into the space between α1, α3, and α4. One 310
helix (θ2) of Ms
IspF overlays with that of Ec
IspF (designated θ1 by [16
]), but the second (θ1) occurs between α2 and α3 rather than following α4.
Figure 7 Ribbon diagram of the trimer. The MsIspF trimer viewed down the molecular three-fold axis. The individual subunits are shown in slate, wheat, and purple. Selected secondary structure elements of the wheat subunit, CDP and Zn2+ are depicted as in Figure (more ...)
Ribbon diagram of the trimer. Orthogonal view compared to Figure 7.
Trimer formation arises from edge-to-face packing of the β-sheets, with the largest section of the interface occurring between β1 and β5 of adjacent subunits (Figures , ). Thus, the interior shape of the trimer resembles a trigonal prism whose faces are comprised of β-sheets from the individual subunits. The Ms
IspF trimer has the same overall dimensions as Ec
IspF, measuring approximately 40 Å in height along the three-fold symmetry axis and 60 Å in diameter at the widest point perpendicular to this axis. In addition, like the E. coli
enzyme, most of the hydrogen bonds between the subunits involve side chain interactions. The trimer interface interactions also resemble those of E. coli
in that they are primarily hydrophobic; approximately 65% of atoms comprising both of these enzyme interfaces are non-polar. E. coli
and M. smegmatis
are mesophiles. In contrast, only 58% of atoms in the interface of IspF from the thermophile T. thermophilus
are non-polar [35
At the center of the trimer is a hydrophobic cavity that opens toward the C-terminal ends of β1, β4 and β5. Side chains of residues Thr10, Val12, Ile102, Thr134, Leu139 and Thr140 from each subunit line the interior of the cavity while two arginines (Arg142 from subunits A and B) and the main chain of Gly138 and Leu139 of subunit C shape the aperture (data not shown). Arg142 is held in place through an electrostatic attraction to Glu144. In EcIspF, a salt-bridge between Arg142-Glu144 from all three subunits forms the aperture. Here, subunit C is less ordered and this contributes to the observed asymmetry. The density is poorly defined between residues 137–144 in subunit C and the average thermal parameter for this region (58.4 Å2) is much higher than in subunits A (34.6 Å2) and B (18.1 Å2).
The distance from the base of the cavity to the opening (16 Å) and the diameter of the aperture (6 Å) are comparable to those observed in Ec
IspF. The volume of the cavity of Ms
IspF (1940 Å3
), however, is significantly larger than that of Ec
IspF (1540 Å3
). In Ec
IspF the cavity is ellipsoidal and the floor parabolic; the major axis of the ellipsoid runs from the aperture to the floor of the cavity [15
]. In Ms
IspF the cavity is trigonal pyramidal, with the aperture corresponding to the tip of the pyramid and the floor to the base. Residue differences in the lining of the cavity contribute to shape and diameter variation. In E. coli
, the cavity is lined with the side chains of six large hydrophobic residues, Phe7 and Phe139 from each subunit, whereas the corresponding residues in Ms
IspF are Thr10 and Leu139. In Ec
IspF, the floor of the cavity is sealed by three His5-Glu149 salt bridges [15
]. Hydrophobic interactions seal the floor of Ms
IspF. Here, residues Leu8 and Ile149 replace the Ec
IspF salt-bridge. The cavity in Mt
IpsF should bear a strong resemblance to that in Ms
IspF since the residues that contribute to the lining (discussed above) are strictly identical in the two sequences (Figure ).
In common with crystal structures of other IspF trimers, non-protein electron density was observed in the hydrophobic cavity of Ms
IspF. In Ec
IspF, phosphate, farnesyl pyrophosphate, GPP, and IPP have been shown to bind within this cavity [15
]. There is as yet no evidence to prove that ligand binding here regulates enzyme activity. The cavity is distant from the three catalytic sites but since, as will be explained, oligomerisation is required to generate the functional enzyme then occupancy of the hydrophobic cleft may contribute to the stability of the IspF trimer.
IspF the density observed in the cavity is diffuse and we presume that a similar mixture of ligands may be present. IPP was modeled into this density at 50% occupancy based on fit and ligand identification in the Ec
IspF cavity. Although a methodical and thorough approach was used in fitting the ligand, the thermal parameters of IPP (47.5 Å2
) exceed the average of the protein (27.5 Å2
). The ligand-protein interactions, though not clearly defined, do resemble those observed in Ec
IspF [PDB code 1H47, [15
]]. The guanidino groups of Arg142 from two subunits bind to the β-phosphate; in Ec
IspF, the side chain from the corresponding residue (Arg142) of all three subunits contributes to this interaction. In Ms
IspF, the bridging phosphodiester oxygen of IPP binds to the amide of Leu139 in subunit C and one of the α-phosphate oxygens binds to the main chain amide of Leu139 in subunit B. In Ec
IspF, these ligand atoms interact with the main chain amide of the corresponding residue (Phe139) of all three subunits.
There are three active sites in the trimer, each located at the interface between two adjacent subunits. The active site (Figures , ) comprises a rigid nucleotide and cation (Zn2+
) binding pocket and a flexible loop for binding the ME2P moiety of substrate [16
]. Only one of the two cation-binding sites, the Zn2+
site, is occupied here [8
]. This cation is approximately 75% occupied in two subunits and 50% occupied in the third. The Zn2+
displays tetrahedral coordination, in similar fashion to that observed in other IspF structures, by Asp11, His13, His45, and the β-phosphate of CDP. In the higher resolution model of Ec
IspF, the second cation (Mg2+
]) is coordinated by the side chain of Glu135 and two oxygens from the diphosphate of CDP. In Ms
IspF and Mt
IspF, the glutamate is replaced with aspartate. This residue is strictly conserved as an aspartate or glutamate across 450 IspF sequences (data not shown), suggesting that a negative charge is required to coordinate the second cation and that either negatively charged amino acid will suffice. In the structure of Ms
IspF, the lower resolution data or CDP disorder may preclude identification of the second cation (see below).
Two conformers of CDP, each at approximately half occupancy, are present in each of the three active sites of the trimer. We only show one conformer in Figures and for the purpose of clarity. In the conformers, the ligand-protein interactions are maintained for the pyrimidine and the ribose but diverge at the diphosphate. The average thermal parameters of the conformers, hereby referred to as CDP-A and CDP-B, are 20.8 and 29.2 Å2, respectively. The presence of CDP disorder is likely linked to the incompletely occupied Zn2+ binding site. The mode of ligand binding of CDP-B more closely resembles that observed in EcIspF (Figure ). In this mode, three interactions are present between the protein and ligand diphosphate. Two of these are hydrogen bonds formed between the α-phosphate and the side chain hydroxyl and main chain amide of the strictly conserved Thr133. The third is ligand-metal ion coordination between the β-phosphate and the active site Zn2+. The interaction between the Zn2+ and the β-phosphate is preserved in CDP-A, but an additional hydrogen bond occurs between the β-phosphate and the hydroxyl group of Thr132. In CDP-A, the α-phosphate also forms a hydrogen bond with the side chain hydroxyl of Thr133, but the bridging diphosphate oxygen interacts with the main chain amide of this residue and the side chain of Thr132 instead.
The architecture of the active site at the cytosine and Zn2+ binding sites and the interactions formed with CDP by MsIspF are similar to that observed in EcIspF. Furthermore, MsIspF residues that contribute to this binding site are all identical or conserved in MtIspF (Figure ). The cytosine is bound in an aliphatic pocket created by side chains of residues from β5 and the loop between β4 and θ2 from a single subunit. The cytidine is stacked between the side chains of Ala131 and Lys107, which are strictly conserved in E. coli and M. tuberculosis. Both binding sites in EcIspF and MsIspF are characterized by four hydrogen bonds between the pyrimidine and main chain atoms of the protein. In MsIspF, these backbone atoms are from residues Gly103, Pro106, Val108 and Gly109, and, in EcIspF, Ala100, Pro103, Met105, and Leu106. These residues are strictly conserved in MsIspF and MtIspF with the exception that MtIspF Ile109 replaces MsIspF Val108 (Figure ).
Because the interactions involve backbone atoms, high conservation of these residues is not necessarily required. The critical elements required to maintain similar protein-ligand interactions are the shape and size of the cytosine pocket. Two pairs of hydrophobic interactions contribute to this function in MsIspF. One pair of hydrophobic interactions occurs between the side chains of Pro106 and Leu146 and the second between the side chains of Val101 and Val108. Both sets of residues are highly conserved (>85%) in 450 IspF sequences, including both E. coli and M. tuberculosis. The first is conserved as a proline-leucine/isoleucine pair and the second as two aliphatic residues, where the identities of the residues are leucine, methionine, valine, isoleucine, or phenylalanine.
The ribose hydroxyls are oriented by several hydrophilic interactions involving strictly conserved residues in MsIspF, EcIspF and MtIspF. The ribose hydroxyls form hydrogen bonds with the side chain of Asp59* (the asterisk denotes contributions from another subunit) and the amide of Gly61*, and solvent-mediated interactions are observed with the side chain of Asp49* and the carbonyl of Ala131 (residues Asp56*, Gly58*, Asp46* and Ala131 in EcIspF, respectively). Moreover, in MsIspF and EcIspF, the side chain orientation of Asp59* is maintained through hydrophilic interactions. Here, this aspartate accepts hydrogen bonds donated by amides of Gly61*, Thr62*, and Ala131 and the side chain of Thr62*, and, in EcIspF, with the amides of Gly58*, Lys59*, and Ala131. The MsIspF residues that contribute to the orientation of Asp59* are strictly conserved in the sequence of MtIspF except for Thr62*, which is a glutamate in the latter. Main chain atoms are the primary contributors to stabilization of Asp59*, so this amino acid replacement is unlikely to affect conformation or function.
The nucleotide-binding pocket is only part of the active site. In Ec
IspF the remaining fragment of substrate, ME2P, is bound by contributions from α2, α3, and residues 33–37, and a flexible loop, which comprises residues 61–71 [16
]. The largest Cα r.m.s.d. differences between Ec
IspF and Ms
IspF occur in this loop. In Ec
IspF, the loop is stabilized by hydrogen bonds between the side chain of His34* and the carbonyl atoms of Asp63* and Asp65*. His34* is conserved in Ms
IspF (His37*), but the aspartates are not. Here, no well-defined density is observed for His37* in two of the trimer subunits. In the third, the side chain of this residue forms hydrogen bonds to the main chain carbonyl of Arg68* and the side chain of Asp67*. The former resembles the Ec
IspF His34*-Asp65* interaction, but the latter reflects the different conformations of this loop present in the two orthologues. This loop is further stabilized in Ms
IspF by a hydrophilic interaction between the carbonyl of Ile63* and the side chain of Arg68*, a residue which is not conserved in Ec
IspF. The stabilization of the loop through hydrogen bonding to the side chain of an aspartate as observed in Ms
IspF can be maintained in Mt
IspF as this residue is identical, but the arginine is replaced by a second aspartate. Although the main chain interactions might be preserved by an aspartate, the side chain interactions could not.
ME2P is oriented by several hydrophobic and hydrophilic interactions with Ec
]. The amides of Ser35* and His34* and the hydroxyl of Ser35* form hydrogen bonds with oxygens of the attacking 2-phosphate group. The identities of these residues and the positions of the residues that bind and orient the attacking 2-phosphate group are maintained in Ms
IspF (Figure ). In Ec
IspF, the side chains of Ile57* and Leu76* make van der Waals contacts with the methyl group of ME2P. These residues are replaced by another hydrophobic pair, Leu60* and Met78*, in Ms
IspF. When a model of Ec
IspF containing substrate [PDB code 1U43, [18
]] is superimposed onto Ms
IspF, these residues are able to maintain contact with the methyl group of ME2P. The 3-hydroxyl group of the ligand interacts with the carbonyl of Phe61 in Ec
IspF. This residue is part of the flexible loop, and the equivalent residue in Ms
IspF (Phe64*) does not maintain this interaction in the superposition. Phe64* is preceded by a glycine in Ms
IspF and a proline in Ec
IspF. Glycine flexibility would permit a conformational change to accommodate interactions between Phe64* and the ligand. Alternatively, there is a hydrogen bond present between the carbonyl of Gly65* and the 3-hydroxyl group of the superimposed substrate. The aforementioned residues corresponding to those observed in Ms
IspF are all identical in Mt
IspF except for Leu60*, which is an isoleucine instead. The binding component of this residue, the side chain hydrophobicity, is maintained in Mt
IspF, as this residue is also an isoleucine in E. coli
and is strictly conserved as isoleucine, leucine, or valine in 450 IspF sequences.