|Home | About | Journals | Submit | Contact Us | Français|
One of the unique features of NMR spectroscopy is that it can provide a set of residue-specific probes with which to study protein–ligand interactions.1 The prime source for these probes is the 1H–15N correlation spectrum, ligand interaction sites being inferred from perturbations of 1H–15N cross-peaks upon titration of ligand into the sample. Clearly, this requires the assignment of cross-peaks to specific residues, which remains a significant challenge despite many technological developments. Recently, several groups have presented methods using selective amino acid labeling to accelerate the identification of ligand binding sites;2-4 however, these techniques only probe a very limited number of residues. Here we present a scheme for the efficient assignment of a much larger number of 1H–15N cross-peaks simultaneously using only five selectively labeled samples that can be rapidly and cost-effectively produced in parallel in a commercially available in vitro translation system. We term the method combinatorial selective labeling (CSL).
The CSL method is based upon the dual amino acid-selective 13C/15N labeling technique,5,6 in which the carbons of one amino acid type a are labeled with 13C, and the amide nitrogens of another amino acid type b are labeled with 15N. If an (a)b pair exists only once in the protein sequence then a single cross-peak will appear in the 1H–15N 2D HNCO spectrum, and the NH group of the residue type b can be unambiguously assigned. In a simple approach, the identification of each different (a)b pair would require a separate sample, demanding a prohibitively large total number of such samples. Our novel approach is to use a much smaller number of samples produced with different combinations of labeled amino acids, using the resulting patterns of cross-peak intensities across these samples to differentiate each (a)b pair.
The experiments that we present here required the production of five protein samples. Each sample contains a different combination of 16 labeled amino acid types, which are individually either 100% 13C/15N labeled or 50%15N/50%14N labeled. The labeling scheme is shown in Figure 1. For each sample, two NMR spectra are acquired: a 1H–15N HSQC spectrum and a 1H–15N 2D HNCO spectrum. Following normalization of the spectra, comparison of the relative peak intensities in the HSQC spectra establishes the amino acid type of each peak. Arginine residues, for example, would have a peak intensity pattern in samples 2–5 of ½:1:½:½, respectively. The 16 amino acid types chosen here can be assigned in these four samples as there are 24 (= 16) such patterns. For a particular cross-peak, the amino acid type of the preceding residue in the sequence is established by examining the presence or absence of peaks in the five 2D HNCO spectra. For instance, if the preceding residue is a threonine, then the HNCO cross-peaks will show the pattern absent:absent:present:absent. Therefore, all 16 × 16 possible amino acid pairs are identifiable simultaneously from these five samples (15 × 16 if proline is included in the set, as in Figure 1). If a pair appears n times in the sequence, then n peaks will appear in these spectra with the same intensity pattern, and the assignment will be n-fold degenerate.
We tested the method on a truncated version of the cycle3 version of green fluorescent protein (GFP) from Aequorea victoria,8 a 27 kDa protein. The samples were labeled in the pattern depicted in Figure 1, using the rapid translation system 500 Escherichia coli HY kit (Roche Diagnostics Ltd). The HSQC spectrum of GFP with all 16 of the chosen set of amino acids fully 13C/15N labeled (sample 1) is shown in Figure 2A. Two cross-peaks are shown in detail in Figure 2B. The HSQC intensities correspond to residue types F (peak x) and L (peak y) (see Figure 1), and the HNCO patterns correspond to preceding residue types S and I, respectively. Therefore, the two cross-peaks can be assigned to amino acid pairs (S)F and (I)L, respectively. The amino acid pair SF occurs only once in the sequence, and therefore peak x can be assigned to F100. The amino acid pair IL occurs twice in the sequence, and thus peak y is assigned to either L15 or L137. The published assignment8 confirms these results.
This protein does not display ideal NMR characteristics. Under the sample conditions used here, we estimate the correlation time to be ca. 21 ns, and furthermore, the protein displays regions of missing and weakened resonances8 due to conformational exchange broadening. In this demonstration of the feasibility of the method, 61 residues could be assigned to the correct (a)b pair. It is anticipated that in proteins with better NMR characteristics (giving improved sensitivity and resolution), and with refinement of amino acid formulations, that the assignment rate would be greatly increased.
This method is qualitatively different from the traditional methods based on HNCA-type experiments that link spins and map them onto the primary sequence. Such methods are sensitive to the completeness of the data for all residues, since incorrect or incomplete information about one residue can confound assignment of another residue. In contrast, the assignment of a particular cross-peak in our method to a particular (a)b pair depends solely on information about that single cross-peak. (Cases of absolute resonance overlap would be detectable by the deviation of the relative peak intensities in the HSQC spectrum from 0.5 or 1.) Furthermore, our method uses two of the most sensitive NMR experiments (1H–15N HSQC and 1H–15N 2D HNCO), which makes it applicable to proteins suffering from poor solubility or tumbling characteristics. The analysis of the data in our technique is also much less demanding and could be carried out by someone with limited NMR experience. Thus, the method opens up opportunities for applying NMR to systems that give less than ideal spectra, to studying groups of related proteins rapidly in parallel, and to making NMR more accessible to non-NMR specialists. All the elements of the technique from protein production, NMR data collection, through data analysis are ideal for robotic and computational automation.
A clear disadvantage of the method is that only partial assignments are obtained. For instance, in GFP 43% of residues are in unique (a)b pairs, 35% are in (a)b pairs that occur twice in the protein sequence, 14% are in (a)b pairs that occur three times in the protein sequence, and 8% exhibit higher degeneracy. While incomplete, these assignments would still provide a very large number of residue-specific probes, which will in general be randomly distributed in the protein. In many applications, an incomplete assignment will be quite adequate, for instance, to resolve the choice between different models. Reese and Dötsch4 showed recently that binding information could be obtained by studying the patterns of chemical shift perturbations in a number of single amino acid-type labeled samples. By searching for a region with an appropriate amino acid composition they were able to identify a binding site, which they proposed could be verified using dual selective amino acid labeling to identify a particular single amino acid. Our technique provides much more detailed information with a comparable number of samples.
The method that we have presented here could be developed in various ways. In cases of severe spectral overlap the number of amino acids that are 15N labeled could be reduced, while retaining the full pattern of 13C labeling. This would reduce the number of potentially overlapping cross-peaks in the 1H–15N planes, without compromising the ability to identify the nature of the preceding residue. Combining the method with segmental isotope labeling9 would both decrease the spectral complexity and improve the number of unambiguous assignments. The availability of deuterated 13C and 15N amino acids would also greatly increase the range of applicability of this method by making use of TROSY10 and saturation transfer11 methods.
The work was funded by the BBSRC, U.K. M.J.P. is a BBSRC David Phillips Fellow and University Research Fellow (University of Leeds). We acknowledge the provision of FELIX software from Accelrys. We thank Paul Gillingham and Mark Wells for assistance with the NMR spectroscopy and Sophie Jackson for a clone of GFP.