|Home | About | Journals | Submit | Contact Us | Français|
The biarsenical tetracysteine motif is a useful tag for genetic labeling of proteins with small molecules in living cells. The present study concerns the structure of a 12-amino acid peptide FLNCCPGCCMEP bound to the fluorophore ReAsH, based on resorufin. 1H-NMR was used to determine the solution structure of the complex formed between the peptide and the ReAsH moiety. Structure calculations based on the NMR results showed that the backbone structure of the peptide is fairly well defined with a hairpin-like turn, similar to a β turn type II, formed by the central CPGC segment. The most stable complex was formed when As2 was bonded to C4 and C5, and As1 was bonded to C8 and C9. Two clear NOESY crosspeaks between the Phe1 sidechain and ReAsH confirmed the close positioning of the phenyl ring of Phe1 and ReAsH. Phe1 was found to have an edge-face geometry relative to ReAsH. The close interaction between Phe1 and ReAsH may be highly significant for the fluorescence properties of the ReAsH complex.
Genetically targeted labeling of proteins with small molecules in intact living cells has become popular in the last decade because such tagging combines genetic precision of targeting with the versatile properties of small molecules. This strategy requires that the protein of interest be fused genetically to a peptide or protein domain that can bind a small ligand tightly and selectively enough so that the ligand labels the chimeric protein nearly stoichiometrically and exclusively in a living cell. Many such peptide domain–small molecule ligand pairs have been introduced, but the first to be introduced, the tetracysteine–biarsenical system, has been the most widely used1–4. Here the peptide domain is of the form X3C2X2C2X3, where C = cysteine and X = a non-cysteine amino acid, and the small molecule is a tricyclic fluorophore with As(III) substituents at the 4,5-positions. Each As(III) reversibly forms two covalent bonds to cysteine thiols. We have used NMR to determine the solution structure of the 12-mer peptide FLNCCPGCCMEP, in complex with the ReAsH fluorophore. This peptide system was previously selected after several generations of optimization, culminating in high-throughput genetic screens and fluorescence-activated cell sorting3. The present threedimensional structure results show a relatively tight peptide hairpin formed around the fluorophore, interacting closely with the N-terminal phenylalanine sidechain.
The advantages of the investigated tetracysteine–biarsenical label system include the small size of the peptide domain, only 12 residues, the modest size of the biarsenical (≤ 750 Da), the membrane permeability of the latter, and the very tight binding (sub-picomolar dissociation constants). Low micromolar concentrations of 1,2-ethanedithiol (EDT) or 2,3-dimercaptopropanol prevent toxicity by diverting the biarsenical from binding to most endogenous molecules containing pairs of thiol groups. Millimolar concentrations of the same vic-dithiols can competitively remove the biarsenical from the tetracysteine domains, so that the labeling can be reversed if desired. The fluorescence of the biarsenical moiety is strongly quenched until it binds to the peptide domain. However, the biggest limitation remains background labeling of endogenous proteins. The main strategy for reducing such background has been to evolve tetracysteine domains with higher affinity so that good labeling can be maintained with lower concentrations of biarsenical, higher concentrations of vic-dithiol competitor, or both2, 3.
Despite many biological applications of this and previous tetracysteine domains, no structure of a biarsenical complex has been reported. Attempts at crystallization either alone or fused to fluorescent proteins have not yet been successful. Here we turned to NMR, which would reveal the three-dimensional structure in aqueous solution. We had a choice between the two most widely used (and commercially available) biarsenicals, “FlAsH” and “ReAsH”, based on fluorescein and resorufin, respectively. Complexes of FlAsH with tetracysteine peptides usually can be resolved by HPLC into two closely-spaced peaks of identical amplitude and mass spectra, probably atropisomers with opposite orientations of the carboxyphenyl group relative to the xanthene rings and the peptide. ReAsH lacks the bulky carboxyphenyl substituent, gives single peaks on HPLC, and was considered preferable. The complex of ReAsH with FLNCCPGCCMEP (C-terminus amidated) was not soluble enough for natural-abundance multidimensional NMR, so we acetylated the N-terminus with hydrophilic substituents such as sulfo, succinyl, and MeO(CH2CH2O)nCH2CH2CO-. Succinyl, −OOCCH2CH2CO-, was chosen because it increased the solubility the most (to 1 mM).
Figure 1 shows the schematic tetracysteine-biarsenical system investigated in this study, with the numbering of the peptide residues and the protons of the resorufin moiety.
CD spectra of the biarsenical complex showed a minimum at 195 nm (Figure S1), suggesting that random coil dominates the secondary structure. A certain contribution from β-structure was suggested by a negative shoulder at approximately 212 nm.
The peptide-ReASH complex in an aqueous solution was investigated by NMR. Using standard homonuclear NMR procedures5, all peptide backbone proton resonances and essentially all sidechain proton resonances were assigned (Table S1). ReAsH by itself is symmetrical and there are two non-equivalent protons in each equivalent ring. A natural abundance 13C-1H HSQC spectrum gave assignments of proton-bonded carbons, in the peptide as well as in resorufin (Table S1), and verified some of the proton assignments. 1H chemical shifts were referenced with respect to the H2O signal at 4.96 ppm6 and 13C chemical shifts were referenced indirectly7.
There are rather large deviations from random coil values of chemical shifts8 for some proton resonances (Figure S2). The presence of the thiol-arsenic bonds causes an approximately 10 ppm down-field shift of the Cβ resonances of the cysteines. The tight structure and potential ring-current shifts are confounded with the presence of the As atoms, making deduction of structural constraints from secondary chemical shifts uncertain.
An NMR translational diffusion experiment9 was performed in order to determine the size of the complex and its oligomeric state in solution. The result showed a translational diffusion coefficient of 12.5×10−11 m2s−1. From this coefficient a hydrodynamic radius and a molecular weight of 1616 Da was determined10 for the diffusing species. This molecular weight is close to that of a monomer (1772.7 Da), showing that the ReAsH bound to the peptide exists as a monomer in the solution at pH 7.4 and 278K.
Crosspeaks in a 1H-1H NOESY spectrum recorded with the mixing time of 300 ms were used to calculate distance constraints for structure calculation using the program CYANA11,12. The intensities (volumes) of the crosspeaks were converted into upper distance constraints. No long-range NOEs were observed, but almost all sequential inter-residue NOEs and some non-sequential medium range NOEs were found (Figure 2). We observed NOEs between the β protons of Cys4 and theα, β, and aromatic protons of Phe1 (Table 1). Two NOEs were found between ReAsH and the peptide: between the δ and ε (ζ) protons of the sidechain of Phe1 and the 16 (12) proton of ReAsH (Figure S3).
The β-protons of Pro6 were stereospecifically assigned. No stereospecific assignment was meaningful for the other β-protons, which were treated in the pseudo atom representation. The four distances between the As and S atoms (Table 1) were given the same upper and lower limits of 2.25 Å as reported from a crystal structure13. In total 233 upper distance and 6 lower distance constraints were used to calculate the structures (Table 1). The upper limits were adjusted iteratively during the structure calculation. Different combinations of binding between the As atoms and the four S atoms in the cysteines were studied by calculating the corresponding structures and comparing the energies and potential constraint violations. Only the isomer where As2 is connected to C4 and C5, and As1 is connected to C8 and C9 results in low energy structures. Other isomers are incompatible with the experimental distance constraints and result in van der Waals conflicts and high energies. Hence, it is concluded that sequential cysteine residues bind to the same As atom, as previously suggested2, based on data from mass spectrometry and fluorescence experiments on chemically modified compounds. Altogether 100 structures were calculated in CYANA. The 30 best structures, regarding violation energies and CYANA target function, were selected for further investigation of the structure.
Figure 3 shows an ensemble of the 30 best structures of the peptide backbone bound to the ReAsH based on the violation energies and CYANA target function. The structure calculation data is summarized in Table 1.
The peptide backbone makes a rather well-defined threedimensional structure where P6 and G7 contribute to a relatively tight turn. The conformation of the central CPGC fragment resembles a β-turn type II, where the i+1 and i+2 residues should adopt the dihedral angles ϕ ψ = (−60°,131°) and ((84°,1°)14, respectively. For the PG fragment we find ϕ ψ ≈ (−70°, 163°) for the P residue and (52°, −32°) for the G residue and that the cysteines in the i (C5) and i+3 (C8) positions adopt β conformation. The resemblance to a type II turn explains why the peptide with the spacer residues PG displayed much better binding to ReAsH than peptides with other spacer sequences2.
The preference of the peptide sequence for a proline in position i+1 in type II turns is very strong15 with proline preferred over all other residue types. The preference for glycine in position i+2 is even higher15, Interestingly, cysteine has the highest preference of all residues in position i+3 and the second highest preference for position i15. Hence, the CPGC sequence should fit almost perfectly into the observed structure when bound to ReAsH, which explains why the apparent dissociation constant Kd is very favorable2. However, the ReAsH peptide complex is not a classical β-hairpin. There is no evidence for hydrogen bonds between Cys5 and Cys8, neither do Cys4 and Cys9 adopt β-conformation or display hydrogen bonds. The cysteine in position 9 adopts an αL conformation to make the interaction between its sulfur atom and the arsenic atom possible. The cysteine in position 4 has the dihedral angles ϕ ψ ≈ (−78°,−72°). The constraints imposed by the covalent S-As bonds to the rigid fluorophore may be the cause of this unusual conformation. If Cys4 and Cys9 would adopt β-conformation, as in a classical β-hairpin, their sidechains would point away from the As atoms.
The four (Cys)S-As-C angles are relatively well defined: from Cys4, 122 +/− 2 deg standard deviation in the 30 best structures; Cys5, 126 +/− 15 deg; Cys8, 112 +/− 15 deg; Cys9, 113 +/− 9 deg. The position of ReAsH relative to the peptide was investigated by calculating the angle between the Phe1 sidechain plane and the resorufin plane in the calculated structures. The Phe1 sidechain is located almost perpendicular to the resorufin plane (angle 92 +/− 21 deg). This edge-face geometry is not unusual for interacting aromatic systems16 and allows the partially positive hydrogens of the phenyl ring to contact the highest electron density at the center of the anionic dye. The close positioning of the phenyl ring of Phe1 and ReAsH explains why Phe is strongly preferred at position 1, as already implied by alanine scanning3. However, the structure provides no explanation for the slight preference that alanine scanning had revealed3 for Asn at position 3.
The average distance between the N-terminus (amide N atom) and the C-terminus (carboxy C atom) of the peptide in the 30 best structures was determined to be 12.5 Å +/− 2.2 Å. This distance is relatively small compared to e.g. an α-helix (17.8 Å) with the same number of residues, confirming the hairpin nature of the construct. The hairpin geometry suggests that this tetracysteine-biarsenical label system may also be internally inserted within a protein structure without too much structure disturbance. Previous insertions were limited to minimal CCXXCC sequences17, The present dodecapeptide sequence should enable insertions with increased affinity and fluorescence brightness.
To test whether Phe1 is uniquely favorable for the fluorescence properties and complex stability, complexes were prepared with Phe1 replaced by His, Tyr or Trp. The results (Table 2) showed that the fluorescence of all three variants was quenched compared to the Phe1 variant, probably by electron transfer from the electron-rich sidechains to the fluorophore. The complex stability as assessed by time constants for ReAsH displacement by EDT was decreased by factors of about 2 or more (Table 2). The NMR structure indicated one surprisingly accessible face of the resorufin that could be available for additional interaction with the C-terminal flanking residues. Interchanging this sequence with the corresponding 3 residues from another optimized peptide, HRWCCPGCCKTF3, which also contains a potentially interacting F, to give FLNCCPGCCKTF did not change the fluorescence properties but decreased the stability to EDT 4-fold.
The NMR results and solution structure calculations of the optimized ReAsH peptide complex suggest that the N-terminus of the 12-mer peptide is relatively tightly constrained. The residues on the N-terminal side of the binding site interact strongly with the resorufin. A hairpin structure with some resemblance to a type II β-turn is formed. The C-terminus is not as well constrained by the NMR results. This could mean that the C-terminus is more dynamic and not strongly interacting with the fluorophore. The N-and C-terminal tripeptides may interact with each other while binding the fluorophore. Further optimization of the tetracysteine motif will require screening of the full 12-mer or even longer sequences.
This study was supported by the Swedish Research Council (to A.G.), the Knut and Alice Wallenberg Foundation (to A.G.), NIH GM072033 (to R.Y.T.), and HHMI (to R.Y.T.).
Supporting Information Available: Experimental methods, Figures S1–S3, Table S1. The structure coordinates of the FLNCCPGCCMEP_ReAsH_ motif and the NMR chemical shift list have been deposited (BioMagResBank ID code 16041).