|Home | About | Journals | Submit | Contact Us | Français|
Although the rapid progress of NMR technology has significantly expanded the range of NMR-trackable systems, preparation of NMR-suitable samples that are highly soluble and stable remains a bottleneck for studies of many biological systems. The application of solubility-enhancement tags (SETs) has been highly effective in overcoming solubility and sample stability issues and has enabled structural studies of important biological systems previously deemed unapproachable by solution NMR techniques. In this review, we provide a brief survey of the development and successful applications of the SET strategy in biomolecular NMR. We also comment on the criteria for choosing optimal SETs, such as for differently charged target proteins, and recent new developments on NMR-invisible SETs.
The advancement of NMR instrumentation and methodology has made solution NMR spectroscopy an increasingly powerful tool for the investigation of protein structure and dynamics under physiological conditions and for studies of ligand binding and reaction mechanisms in solution. However, the inherent sensitivity limitation of NMR requires protein samples to be stable at high concentrations (> 100 µM for structural studies) for an extended period (typically over a couple of days). Unfortunately, an estimated 75% of soluble proteins and many biologically important macromolecules are characterized by low solubility and instability (Christendat et al. 2000). Therefore, preparation of well-behaved, non-aggregated samples at sufficiently high protein concentrations remains a serious challenge for structural and dynamic studies by NMR.
Numerous efforts have been devoted to overcoming the solubility and sample stability issues. For example, extensive buffer screening (Bagby et al. 1997; Lepre and Moore 1998), addition of charged amino acids (Golovanov et al. 2004), or introduction of point mutants (Huang et al. 1996; Ito and Wagner 2004; Sun et al. 1999) have been successfully utilized to increase the solubility of the target proteins. However, these methods are often protein specific, largely based on trial and error, and may not be easily applicable to other systems. To overcome these issues and develop a generic approach, we introduced the concept of non-cleavable solubility-enhancement tags (SETs) for studies of poorly behaving proteins by solution NMR (Zhou et al. 2001b). Since then, this strategy has found wide applications in the NMR community, and has been used to improve the solubility and sample stability of ~30 proteins. For many of these examples, this approach has enabled successful determination of high-resolution solution structures. Here, we give a brief overview of the initial development, the theory and the successful application of the SET strategy in biomolecular NMR studies, and we comment on recent improvements of the SET strategy. We refer readers to the excellent review by Waugh for applications of protein tags in a non-NMR setting (Waugh 2005).
Protein tags such as GST and MBP have been widely used as affinity tags for purifying recombinant proteins (di Guan et al. 1988; Smith and Johnson 1988). It was frequently observed that these fusion proteins overexpress better and exhibit enhanced solubility and sample stability compared to their untagged counterparts. This observation has prompted the search of new fusion tags to improve the soluble expression of target proteins in E. coli ((Davis et al. 1999; DelProposto et al. 2009; Forrer and Jaussi 1998; Huth et al. 1997; LaVallie et al. 2000; Pilon et al. 1996; Samuelsson et al. 1994; Zou et al. 2008; Zuo et al. 2005); reviewed by Waugh (Waugh 2005)). Due to the size limit of NMR techniques (~30 kDa), it is preferable to remove the protein tag before subsequent NMR studies. Unfortunately, once the fusion tag is cleaved by proteolytic digestion, the target protein often becomes unstable again and precipitates within hours, thereby prohibiting further NMR studies.
Because it is only the size limit that restricts the use of protein tags in solution NMR studies, we reasoned that a highly soluble and stable protein that is also sufficiently small can be used as a non-cleavable tag for NMR studies. Several small protein tags, such as protein G B1 domain (GB1, 56 residues) (Huth et al. 1997), protein D (110 residues) (Forrer and Jaussi 1998), the Z domain of Staphylococcal protein A (58 residues) (Samuelsson et al. 1994) and thioredoxin (109 residues) (LaVallie et al. 2000), have been shown to increase the yield of soluble proteins. We chose the smallest tag, GB1 as the solubility-enhancement tag for further evaluation. In our study of the DFF40/45 N-terminal CIDE domain complex, attachment of the non-cleavable GB1 tag to DFF45 not only increased the solubility of the DFF40/45 complex from 0.2 mM to 0.6 mM, but also increased the sample stability from 5 days to over a month at 23 °C (Zhou et al. 2001b). The use of the solubility-enhancement tag has resulted in a dramatic improvement of spectral quality (Figure 1) and has enabled subsequent structure determination of the DFF40/45 CIDE domain complex by NMR (Zhou et al. 2001a). To our knowledge, this is the first demonstration of using non-cleavable solubility-enhancement tags to overcome sample solubility and stability issues for structural studies by NMR.
Since the initial demonstration and application of the SET strategy to NMR structure determination (Zhou et al. 2001b; Zhou et al. 2001a), this fusion tag approach has found wide applications in the NMR community. Approximately 30 examples have now been reported in the literature, which show significant enhancement of protein solubility and/or sample stability using SETs (Table 1). Additionally, in many cases, the creation of SET-fusion proteins also significantly improved protein overexpression levels in E. coli and the final yields of the purified proteins. These target proteins cover a wide range of structural topologies and biological functions, which truly demonstrate the generality of the SET approach in biomolecular NMR studies.
Although GB1 has been a highly successful solubility-enhancement tag, other highly soluble and stable small protein domains can also serve similar functions. Unfortunately, how the SET enhances the solubility of a target protein remains poorly understood, and comparative proteomic studies have not revealed a universally good tag for all protein targets (Hammarström et al. 2002; Hammarström et al. 2006). Based on a thermodynamic analysis, we suggest here the following criteria for choosing a solubility-enhancement tag.
Ideally, a solubility-enhancement tag should be “transparent” to the target protein, i.e., it should not perturb the structure or function of the target protein. In the absence of such prior knowledge, proper control experiments must be included to demonstrate the “inertness” of the solubility-enhancement tag for functional assays. Likewise, the lack of perturbations of tag resonances in the fusion protein provides a compelling argument that the solubility-enhancement tag does not interact with the target protein and is unlikely to alter its structure.
In this regard, GB1 appears to be remarkably “transparent” as demonstrated in a variety of GB1-fusion proteins in NMR studies (Table 1). Interestingly, many examples of the GB1-fusion proteins in NMR studies also display better sample stability at high concentrations (µM-mM). Because the “passive” GB1 tag is unlikely to alter the thermal stability of the target protein, the improved sample stability presumably results from the enhanced solubility and reduced aggregation of the fusion protein.
Because GB1 is slightly acidic (pI=4.5), it may cause non-specific electrostatic interactions when fused to proteins with basic pI values. To avoid these non-specific interactions, we created a GB1 mutant (GB1basic) by mutating D22N, D36R, and E42K, which increased the pI of GB1 to 8.0 (Zhou and Wagner, unpublished). This basic GB1 tag was successfully utilized to prepare highly soluble HPV16 E6 samples and prevent non-specific electrostatic interactions between the tag and the target protein (Liu et al. 2009). Without the tag, the solubility of the E6 constructs was too low to record spectra (J. Baleja, private communication). Consistent with this notion of choosing a SET based on matching its charge state with that of the target protein, Harrison and co-workers showed in their statistical model that avoidance of charge neutralization increases the probability of producing soluble proteins in E. coli (Davis et al. 1999; Wilkinson and Harrison 1991).
It should be noted that an “active” fusion tag can also be highly effective. For example, Ikura and co-workers fused the TAF N-terminal Domain 1 and 2 (TAND12) with its binding partner TATA-binding protein (TBP) to form a stable protein complex, which displayed enhanced solubility and sample stability (Mal et al. 2007). However, such an “active” fusion tag is target specific and cannot be easily applied to other proteins.
Assuming that (1) there is no interaction between the tag and the target protein, (2) there is no structural change of either the tag or the target in the fusion protein, and (3) the contribution of the linker can be neglected, we give an estimation of the solubility-enhancement effect based on a simple thermodynamic model. Although the analysis below focuses on fusion proteins containing a single tag, it is straightforward to extend such an analysis to fusion proteins with multiple tags.
The free energies of individually transferring A (the tag) and B (the target protein) from the solid state to the solution state are given by:
At equilibrium (i.e. at saturation), the free energy of transferring the A and B from the solid state to the solution state is zero. Therefore one has:
, which can be re-arranged to give
If there is no interaction between A and B, we can conceptually describe the transfer of the fusion protein A-B from the solid state to the solution state as two separate processes: transferring ASolid to ASolution and transferring BSolid to BSolution. The free energy of such a combined transfer is zero at equilibrium.
Because the covalent linker requires
, by substituting and with , we can rewrite Eq.  as
, which requires
Therefore, we have the saturation concentration of the fusion protein as:
We note that the above analysis does not account for changes of solid or solution state compositions, nor does it take into consideration of intermediate species (such as ASolid – BSolution and ASolution – BSolid ) of the solvation process. The latter approximation, in particular, can introduce a very large error in the solubility estimation of the fusion protein. Finally, strictly speaking, the concentration terms of Eq.  should be effective concentrations (i.e. activities), which may deviate from the apparent protein concentrations. This effect is expected to be larger at higher concentrations, which can result in an overestimation of the effective tag concentration at saturation. Because of these limitations, Eq.  can only be used in a qualitative way. It nevertheless gives a useful evaluation of the beneficial effect brought by a solubility-enhancement tag.
To give an example, we were able to make 15–20 mM GB1 solutions routinely without any noticeable precipitations. Using these numbers as the solubility of GB1, we estimate that the SET approach yields a saturation concentration of 1.2–1.4 mM or 0.38–0.44 mM for a target protein with inherent solubility of 0.1 mM or 0.01 mM respectively, corresponding to a ~10–40 fold enhancement of the solubility over the untagged protein! Experimentally, approximately 3–100 fold enhancements of solubility have been reported for GB1-fusion proteins (Hiller et al. 2003; Kobashigawa et al. 2009; Zhou et al. 2001b). The largest effect was reported for the pyrin domain of NALP1, which saw its solubility increased from ~10 µM to 1 mM (Hiller et al. 2003).
Eq.  argues that proteins with higher intrinsic solubility, but not with larger molecular weights, function as better tags. Although this conclusion may seem counterintuitive, several large scale solubility studies have consistently categorized the small GB1 tag (5.6 kDa) as one of the most effective tags to use (Hammarström et al. 2002; Hammarström et al. 2006). For example, Hammarström compared the effect of different tags on the solubility of 27 small- to medium-sized human proteins, and ranked GB1, MBP and thioredoxin as the best tags (Hammarström et al. 2002). The authors concluded that the there was no statistical difference of GB1, MBP and thioredoxin in their ability to enhance the solubility of a target protein. It is important to note that in most of the studies, the solubility (often reported as gel intensity) reflects the mass yield of the fusion proteins, but not the untagged target proteins. This could lead to an overestimation of the solubility-enhancement effect for large tags such as MBP or NusA. After correcting for the molecular weight contributions from different tags, Hammarstrom et al. concluded that GB1 gave a significantly larger amount of soluble target proteins for the 45 human proteins tested (Hammarström et al. 2006).
Finally, we would like to emphasize that Eq.  is based on a thermodynamic analysis. It assumes no interaction between the tag and the target protein and requires the solvation process to be fully reversible. Several protein tags have been shown to facilitate protein folding in E. coli by promoting disulfide bond formation (Stewart et al. 1998), by serving as a molecular chaperone (Bach et al. 2001; Kapust and Waugh 1999) or by enhancing transcription pausing (Davis et al. 1999). In these scenarios, the significantly better “solubilizing” effect of the “active” tags over “passive” tags may reflect the benefit of folding kinetics, but not thermodynamics.
Because NMR experiments are performed under a variety of pH, temperature and buffer conditions, a good solubility-enhancement tag should be stable under these conditions. The rapid two-state refolding property of a tag can also be highly beneficial. For example, in the study of mutant myotoxin a (MyoP20G), Cheng and Patel reported that GB1 appears to increase protein (re)folding efficiency (Cheng and Patel 2004), which likely comes from the enhanced solubility (and reduced aggregation) of the denatured fusion protein.
As reported in early literature, a successful solubility-enhancement tag often enhances protein overexpression levels and increases the yields of the purified proteins. Some tags, such as MBP and thioredoxin, have been suggested to serve as chaperones to promote proper folding of target proteins (Bach et al. 2001; Kapust and Waugh 1999; Kern et al. 2003). Although similar benefits in protein expression levels and yields have been observed for GB1-fusion proteins (Table 1; also see studies by Hammarström et al. (Hammarström et al. 2002; Hammarström et al. 2006)), the experimental evidence for the chaperone activity of GB1 is lacking. It should be noted that such effects do not have to derive from the chaperone activity. The enhanced solubility of the fusion protein itself is expected to facilitate protein folding and overexpression in vivo and increase the yield of protein purification in vitro by reducing protein aggregation and precipitation.
Several studies reported diminished effects of SETs on the E. coli expression of large proteins (>25–30 kDa) in soluble fractions (Hammarström et al. 2002; Hammarström et al. 2006). Because large proteins frequently require chaperones or binding partners to fold properly, it is likely that these observations reflect an intrinsic folding (kinetic) problem of the large proteins, rather than the ineffectiveness of SETs.
Despite the success of the SET approach, it still brings a sizeable amount of extra signals from the protein tag. For a target protein of 10–20 kDa, inclusion of a small GB1 tag (56 residues) easily adds about a quarter to a half of “extra” signals to those from the untagged protein. Although the excellent signal dispersion and the lack of resonance perturbation make the tag signals easy to identify, they nevertheless bring extra burden and complexity for resonance assignment.
Recently, two types of NMR-invisible tags have used to overcome this issue (Figure 2) (Durst et al. 2008; Kobashigawa et al. 2009; Züger and Iwai 2005). Both approaches start from an isotopically enriched fusion protein containing a cleavable solubility tag. A second and unlabeled solubility tag—which is invisible by NMR—is then introduced to maintain solubility. The isotopically labeled tag is subsequently removed to generate the final form of the NMR sample.
The two approaches differ in how the NMR-invisible tag was introduced. In the first approach, the unlabeled GB1 tag was attached to the isotopically labeled chitin-binding domain or the Vav C-terminus SH3 domain using either an intein-based or a sortase-mediated protein ligation strategy (Kobashigawa et al. 2009; Züger and Iwai 2005). Because the yield of the final fusion protein depends on the ligation efficiency, optimization of the ligation condition is critical for the general application of this approach. In the second approach, a calmodulin-binding peptide (CBP, 23 residues) was included in the construct of the GST-tagged target protein (Durst et al. 2008). The unlabeled calmodulin, which binds the CBP, was added to the solution. After formation of the calmodulin/CBP complex, the isopotically labeled GST-tag was removed by proteolytic cleavage, and the unlabeled calmodulin served as the NMR-invisible solubility-enhancement tag. Because the latter approach bypasses the protein ligation step completely, it is more convenient to use. However, there is no reason why one should be restricted to the CBP tag of 23 residues; systems using shorter peptides and the corresponding high-affinity binding partners are likely to emerge in the future.
The preparation of highly soluble and stable samples represents a significant challenge for solution NMR studies of proteins with inherent poor solubility and stability. The use of solubility-enhancement tags has been demonstrated to overcome sample solubility and stability barriers and has enabled detailed structural analyses of many poorly-behaving proteins. The recent development of NMR-invisible tags promises to further expand the application of the SET strategy in biomolecular NMR.