|Home | About | Journals | Submit | Contact Us | Français|
One of the most important but still poorly understood issues in protein chemistry is the relationship between sequence and stability of proteins. Here, we present a method for analyzing the influence of each individual residue on the foldability and stability of an entire protein. A randomly mutated library of the crystallizable fragment of human immunoglobulin G class 1 (IgG1-Fc) was expressed on the surface of yeast, followed by heat incubation at 79 °C and selection of stable variants that still bound to structurally specific ligands. High throughput sequencing allowed comparison of the mutation rate between the starting and selected library pools, enabling the generation of a stability landscape for the entire CH3 domain of human IgG1 at single residue resolution. Its quality was analyzed with respect to (i) the structure of IgG1-Fc, (ii) evolutionarily conserved positions and (iii) in silico calculations of the energy of unfolding of all variants in comparison with the wild-type protein. In addition, this new experimental approach allowed the assignment of functional epitopes of structurally specific ligands used for selection [Fc γ‐receptor I (CD64) and anti-human CH2 domain antibody] to distinct binding regions in the CH2 domain.
► Investigation of the relationship between sequence and stability of the CH3 domain. ► Knowledge about the impact of all individual amino acids on the thermal stability. ► First construction of a stability landscape for an entire protein domain. ► Mapping of the binding site of Fc γ‐receptor I on human IgG1. ► New and generally applicable method for protein characterization and engineering.
Stability is among the most critical factors influencing the applicability of a protein. Nevertheless, the relationship between the sequence and stability of proteins is one of the still unsolved issues in protein chemistry. Site-directed mutagenesis followed by analysis of purified protein variants has been the method of choice for experimentally analyzing the influence of a particular amino acid residue on the overall stability of the protein.1 This approach significantly contributed to our current understanding of the determinants of protein stability. However, due to the necessity of expressing, purifying and analyzing each protein variant individually, the number of mutations that can be analyzed within reasonable time is rather limited.
This drawback can be overcome by application of in vitro selection methods, such as ribosome display,2,3 phage display4,5 or yeast display,6 where the phenotype of a protein is linked to its genotype. These display technologies are mostly applied for selection of mutations with beneficial effects on protein function (e.g., antigen binding) from a randomly mutated library. Alternatively, display methods have also been used for identification of mutations negatively interfering with a certain protein function. For example, epitopes have been identified by selection of ribosome- or phage-displayed protein libraries for binding to a ligand and subsequent analysis of the obtained pool by sequencing.7–9 Mutations interfering with ligand binding are eliminated from the library during selection, thereby enabling identification of the functional epitope. These studies demonstrated the applicability of display technologies for identification of residues that are essential for protein function.
However, in order to determine regions or specific positions where the mutation frequency is decreased during selection, high numbers of sequences are necessary. Due to the need of Sanger sequencing, this approach was restricted to low sequence numbers, thereby limiting the statistic significance of the results or the number of residues that could be analyzed simultaneously in one experiment. Recently, this limitation was eliminated by Fowler et al., who characterized the interaction of the WW domain of the human Yes-associated protein 65 (hYAP65) with a peptide ligand by combining phage display selection of a randomly mutated WW domain library with high throughput sequencing.10
Yeast display is another highly potent display technology.6 Besides many other applications, this method has also been utilized for stability engineering of various proteins including single-chain T‐cell receptors11,12 and human IgG1-Fc13 [crystallizable fragment of immunoglobulin G class 1 (IgG1)]. In these studies, randomly mutated protein libraries were displayed on yeast, followed by heat incubation and selection of variants that still bound to structurally specific ligands.14 Thereby stabilized mutants, which are more resistant to irreversible thermal denaturation, were generated. In addition, yeast display has also been applied for measuring the thermal stability of yeast‐displayed proteins.15 Various single-chain variable antibody fragments and single‐chain T‐cell receptors were expressed on the surface of yeast, and after incubation at temperatures ranging from 30 to 80 °C, they were analyzed for binding to structurally specific ligands, resulting in the generation of thermal denaturation curves. Importantly, the temperatures of half‐maximal irreversible denaturation strongly correlated between yeast‐displayed proteins and solubly expressed proteins.15 Together, these studies strongly underlined that yeast display can be used for separation of protein variants based on their thermal stability.
The aim of the present study was the construction of a stability landscape of the CH3 domain of human IgG1 by combining the stability-based selection described above with high throughput sequencing. An IgG1-Fc library was generated by error‐prone polymerase chain reaction (PCR), displayed on yeast and subjected to heat incubation. After cooling, yeast cells displaying stable IgG1-Fc variants were selected by probing them for binding to Fc γ-receptor I (FcγRI) or to an antibody directed against the CH2 domain (anti-CH2). Subsequently, the original and the selected libraries were analyzed by high throughput sequencing. Obtained data reflect the impact of each individual residue on protein folding and stability, resulting in the generation of a stability landscape. Evolutionarily conserved residues, as well as residues located at the interface between the two CH3 domains of the homodimer, are shown to be less tolerant to mutation. Moreover, it is demonstrated that changing the ligand used for selection (anti-CH2 versus FcγRI) allows identification of the corresponding ligand binding sites in the CH2 domains.
Homodimeric IgG1-Fc consists of two polypeptide chains, each of which contains the hinge region, one CH2 domain and one CH3 domain (Fig. 1a). Interaction of the two chains is mainly mediated by two disulfide bridges in the hinge region and by extensive contacts between residues of the two CH3 domains.16 Moreover, a glycosylation linked to N297 (Eu numbering system17) is located between the two CH2 domains. In this study, we aimed to construct a stability landscape for the CH3 domain of this homodimeric protein.
First, a library of IgG1-Fc variants was constructed. Point mutations were distributed over the entire Fc gene including hinge region, CH2 domain and CH3 domain (total length of 220 amino acids) by error‐prone PCR, resulting in an average of 1.5 amino acid mutations per Fc. Since the library size was 2 × 106, the number of amino acid mutations in the library was 3 × 106, exceeding the number of possible amino acid (4.2 × 103) and nucleotide substitutions (2.0 × 103) by approximately 3 orders of magnitude. Therefore, every amino acid substitution that is reached with only one nucleotide change will be represented in the library several times.
This IgG1-Fc library was expressed on the surface of yeast by fusing it to the C-terminus of the yeast cell wall protein Aga2p (Supplemental Fig. 1). Subsequently, the yeast suspension was incubated for 10 min at 79 °C, which is in the range of the temperatures of unfolding (Tm) of the CH3 domains: the CH3 domains of native IgG1-Fc denature at 82 °C,18 whereas the CH3 domains of IgG1-Fc expressed in the Pichia pastoris show two transitions at 78 and 83 °C, respectively.13 Thus, this thermal stress resulted in partial unfolding of the displayed Fc proteins dependent on their thermal stabilities. After cooling, the surface‐displayed Fc variants were probed for binding to the structurally specific ligands FcγRI (also termed CD64) and anti-CH2 (an antibody directed against the CH2 domain; clone MK 1 A6). For binding to anti-CH2 and FcγRI, 63% and 60%, respectively, of the cells were negative but positive for the expression marker (Xpress) (Fig. 1c, first column). Lack of binding can be caused either by (i) mutations located in the epitope of the respective ligand or by (ii) mutations impairing the native fold or the thermal stability of the Fc protein. In order to investigate how much of this fraction of cells that do not bind after heat incubation are caused by unstable clones, we also tested the library for ligand binding in the absence of the heat denaturation step, yielding approximately 38% and 42% negative clones when probed for binding to anti-CH2 and FcγRI, respectively (Fig. 1b, first column). Those cells express Fc variants that lost ligand binding due to (i) interferences with the epitope or (ii) misfolding. Moreover, genetic aberrations such as frameshifts or stop codons could also be a reason for negativity. However, sequences of Fc mutants containing frameshifts or stop codons were excluded from further analysis (see below) and therefore did not interfere with the outcome of this study.
Next, the library was displayed on yeast again, followed by heat incubation (79 °C, 10 min). After cooling, the surface‐displayed Fc variants were probed for binding to FcγRI, and the top 5% were selected by flow cytometric sorting. Importantly, comparison of the original and the selected library pools demonstrates that the majority of the cell population, which does not bind to structurally specific ligands after heat stress, was eliminated during the selection process (Fig. 1c, compare first and second columns).
Next, the selection was repeated, starting with the same library. The only difference to the first experiment was the usage of anti-CH2 instead of FcγRI for flow cytometric sorting. Similar to the library that was obtained by FcγRI selection, this anti-CH2 sorted library also lost the majority of the FcγRI-negative and anti-CH2-negative cell population (Fig. 1c, third column). Remarkably, selection for FcγRI binding eliminated not only FcγRI‐negative cells but also anti-CH2‐negative cells and vice versa. This demonstrates that, for the majority of Fc variants, the reduced binding to FcγRI or anti-CH2 after heat incubation was not caused by interference of mutations with ligand binding, but rather with the folding process and/or the stability of the protein. However, the elimination of negative cells was slightly more efficient when analyzed with the same ligand that was also used for selection (~ 3–5% difference), most probably accounting for mutations located in the ligand binding sites.
For further characterization of the libraries, plasmid DNA was isolated and analyzed by high throughput sequencing. The two selected Fc libraries were analyzed in individual sequencing experiments (in both cases, together with the original library), as well as Fc-wt (recombinant wild‐type Fc protein) for estimation of the sequencing error rate. As the analyzed region (660 bp) exceeded the read length limit of the 454 sequencing technology (~ 400 bp), each library was amplified in two PCRs covering the CH2 domain (together with the C-terminal part of the hinge region; 114 amino acid positions) and the CH3 domain (106 positions), respectively. Apart from short reads, sequences containing frameshifts, which are the most common sequencing errors of the 454 technology, were also excluded from further analysis. The number of reads of the CH2 and CH3 domains of each library, which passed the quality control, is listed in Supplemental Table 1.
From this set of sequences, the average mutation rate was calculated for the CH2 and CH3 domains of all libraries and Fc-wt. The average mutation rate was ~ 0.01% in both domains of the Fc-wt sample, demonstrating that the number of sequencing errors was negligible compared to the number of mutations in the libraries (Table 1). In the original library, about 0.7% of the residues were mutated with no striking differences between the two sequencing experiments and between the two domains. However, selection for either FcγRI or anti-CH2 binding after heat stress resulted in reduced mutation rates in both domains with a more pronounced decrease in the CH3 domains. In the CH3 domains, only 30% and 27% of the mutations remained after selection for binding to FcγRI and anti-CH2, respectively, as opposed to 40% and 45% in the CH2 domains (Table 1, bottom). This reflects the thermal unfolding pathway of IgG1-Fc with the CH2 domain denaturing reversibly, as long as the irreversibly unfolding CH3 domain retains its native fold.18 This means that mainly the thermal stability of the CH3 domain determines whether the protein still binds to structurally specific ligands after heat incubation. Therefore, in this experimental setup, the selection pressure is higher in the CH3 domain, resulting in a stronger reduction of mutations during selection compared to the CH2 domain. From the latter domain, only mutations interfering with the foldability or the ability to refold after thermal denaturation or mutations located in the binding sites of the ligands used for selection will be eliminated.
Finally, the total mutation rate was determined for each amino acid position. Subsequently, the mutation rates after selection were divided by the ones before, yielding the change in the mutation rate (Figs. 2 and 3 for the CH2 and CH3 domains, respectively). This number indicates the tolerance of a particular residue to mutation, which is a parameter for its importance for either (i) the structural integrity of the protein (fold and stability) or (ii) the binding to the respective ligand. As the CH2 domain only refolds after thermal denaturation if the irreversibly denaturing CH3 domain retains its native fold,18 ligands binding to the CH2 after heat stress can be used as folding and stability sensors of the CH3 without the need of directly binding to this domain. Since CH2-specific ligands were chosen for selection, the mutation rate changes in the CH3 domain can mainly be attributed to the impact of the respective residue on the fold and the thermal stability, resulting in the generation of a stability landscape (Fig. 3).
In order to validate this stability landscape, we investigated whether there was a stronger reduction of mutations at evolutionarily conserved positions. The IgG1-CH3 domains from 12 species were aligned, and 27 residues, which are identical in all sequences, were identified (Fig. 3, blue bars, and Fig. 6, blue dots). Indeed, the average percentage of remaining mutations after selection was significantly lower at evolutionarily conserved positions than at non-conserved positions: 7% versus 22% in the FcγRI selection and 4% versus 17% in the anti-CH2 selection (Fig. 4a).
For further validation, the ΔΔG values were predicted for all mutations in the CH3 domain using the software FoldX.19–21 Subsequently, at each position, the median ΔΔG of all 19 possible mutations was calculated, thereby obtaining a prediction value that indicates the impact of the respective residue on the stability of IgG1-Fc. These median ΔΔG values were compared to the mean of the mutation rate changes from the two selections. As Fig. 5 clearly demonstrates, intolerance to mutation in the selection experiments strongly correlated with high ΔΔG values.
Comparison of the mutation rate changes of the two experiments shows a strong correlation for residues in the CH3 domain (Fig. 6b). This means that if a certain residue was intolerant to mutation in the FcγRI selection, this was also observed in the anti-CH2 selection. By contrast, at many positions in the CH2 domain, the mutation rate changes did not correlate (Fig. 6a). In order to determine the residues where the tolerance to mutation strongly differed between the two experiments, we calculated the FcγRI/anti-CH2 ratio by dividing the change in the mutation rate during FcγRI selection by the one during anti-CH2 selection. Positions where this ratio was higher than 8 or lower than 1/8 were defined as being implicated in binding to either FcγRI or anti-CH2, as the usage of the ligand was the only difference between the two experiments. At residues C229, L235, G236, G237, P238, V263, V264, D265, Q295 and N325, the FcγRI/anti-CH2 ratio was lower than 1/8, strongly indicating that these residues are involved in FcγRI binding (Fig. 2, green arrows, and Fig. 7).
On the other hand, at positions K248, S254, R255 and D280, the tolerance to mutation was strongly reduced in the anti-CH2 compared to the FcγRI selection (FcγRI/anti-CH2 ratio above 8), suggesting that the epitope of this antibody is located at the C-terminal part of the CH2 domain, where K248, S254 and R255 form a coherent surface (Fig. 7). Further residues in this area, where the reduction of mutations was more pronounced in the anti-CH2 experiment, include K246 and K290 in the CH2 domain and Q386 and N434 in the CH3 domain (Figs. 2 and 3). However, at these positions, the FcγRI/anti-CH2 ratio was not above 8, indicating that these residues were not as important for recognition by this antibody as the ones described above.
In any case, the binding site of FcγRI was assigned to the hinge-proximal region of the CH2 domain, which is in agreement with various other studies (see Discussion below). The epitope of the anti-CH2 antibody was located at the C-terminal part of this domain. In both cases, these binding sites can easily be recognized by comparison of the mutation rate changes depicted in the space‐filled models in Fig. 7.
The correlation between predicted ΔΔG values and changes in the mutation rate during selection, together with the fact that evolutionarily conserved residues are significantly more intolerant to mutation than non-conserved ones, strongly support the quality of the constructed CH3 stability landscape.
Its analysis reveals that there are certain regions in the protein where mutations are strongly eliminated during selection, such as the CH3–CH3 interface, which is responsible for homodimer formation. This was statistically confirmed by dividing the residues of the CH3 domain into two groups: (i) interface residues (marked with black arrows in Fig. 3) with at least one atom (including backbone atoms) that is located within 4 Å from the CH3 domain of the other chain and (ii) all other amino acids that are not in close proximity to the other chain (> 4 Å) [based on X-ray structure with Protein Data Bank (PDB) ID 1OQO]. In both selection experiments, the reduction of mutations was significantly more pronounced in the interface group (Fig. 4b), demonstrating the importance of CH3–CH3 interactions for the native fold and the stability of IgG1-Fc. Staining all positions in the crystal structure of human IgG1-Fc according to their mutation rate changes during selection clearly shows the patch of interface residues, which are highly intolerant to mutation (Fig. 7, structures on the left).
Interestingly, in the C- and F-strands of the outer β-sheet (positions 378–383 and 423–428, respectively), residues that were tolerant and intolerant to mutation alternated (Fig. 3). This is caused by the orientation of residues in a β-sheet. The hydrophobic side chains of V379, W381, F423 and V427, as well as the disulfide bond forming residue C425, all of which are directed to the hydrophobic core of the CH3 domain (Supplemental Fig. 2) and interact with residues of the inner β-sheet, were highly intolerant to mutation. In contrast, the solvent‐exposed side chains of A378, E380, E382, S424, S426 and M428 (Supplemental Fig. 2) were more tolerant to mutation. This strongly suggests that the intolerance to mutation of a particular residue is not primarily caused by its localization in a secondary structural element (in this case, in a β-sheet), but by side‐chain interactions with other parts of the molecule. In contrast to the C- and F-strands of the outer β-sheet, this alternating pattern was not observed at the inner β-sheet (Fig. 3). There, all side chains interact with other parts of the CH3 dimer: on one side, with the CH3 domain of the other chain, and on the other side, with the outer β-sheet of the same domain.
Furthermore, mutations were also strongly eliminated from positions C367 and C425, which form an intradomain disulfide bond between the B- and F-strands of the CH3 β-sheet structure. This finding is in agreement with previous studies, showing that the reduced form of the CH3 domain of IgG1 is less resistant to denaturation by heat, guanidinium chloride or low pH compared to the oxidized form.22,23 Furthermore, the fact that, at both positions, more than 96% of the mutations were eliminated within only one selection round demonstrates the high efficiency of this experimental strategy.
In another study, the impact of alanine mutations on the stability of solubly expressed single-chain CH3 heterodimers (with only one CH3 domain containing the mutation) from human IgG1 was assessed.16 Particularly, highly destabilizing effects were observed for alanine mutations at the following six positions: T366, L368, P395, F405, Y407 and K409. The latter residue forms an interdomain hydrogen bond with D399. Surprisingly, only marginal destabilization was detected after mutating D399 to alanine. In our selections all these positions, including D399, were highly intolerant to mutations, confirming the high impact of these residues on the stability of the CH3 dimer. The discrepancy concerning the tolerance of residue D399 to mutation could arise from the different formats that were applied in the two studies: in the present study, the entire Fc was expressed and mutations were present in both domains, as opposed to the single‐chain CH3 heterodimer with the mutation only being present in one of the two chains. Moreover, we analyzed the changes in the total mutation rate, including mutations to various other amino acids, whereas in the other study, only mutations to alanine were tested.
The high impact of residues T366 and Y407 on protein stability was also found in another work on human IgG1-CH3 dimers.24 Differential scanning calorimetry revealed that the T366W and Y407A homodimers were destabilized by 26 and 22 °C compared to the wild-type protein, respectively.
A further residue that has been demonstrated to be highly important for the biophysical properties of the CH3 domain is P374. Isomerization from trans- to cis-proline has been shown to be the rate‐limiting step in CH3 folding.25 Moreover, the authors demonstrated that prolyl isomerization at P374 is necessary for the formation of the CH3 homodimer. Accordingly, mutations were strongly eliminated from this position with only 4% remaining after selection.
Apart from construction of the stability landscape of the CH3 domain, the applied method additionally provided information about ligand binding sites. As outlined above residues C229, L235, G236, G237, P238, V263, V264, D265, Q295 and N325 are suggested to be involved in FcγRI binding (Fig. 2, green arrows, and Fig. 7). Indeed, the localization of the FcγRI binding site at the N-terminal part of the CH2 domain and in close proximity to the hinge region is in agreement with various other studies on human IgG1. In particular, the lower hinge region including residues 233–238 has been shown to be essential for recognition of human IgG1 by FcγRI.26–29 In the present study, C229 also seems to play a critical role in FcγRI binding. Mutating this residue may have severe impacts on the overall structure of the hinge region by preventing the formation of the disulfide bond with the second heavy chain.
The glycosylation attached to N297 has also been shown to be essential for recognition of FcγRI.29–31 Accordingly, the mutation frequency is strongly reduced at this position during FcγRI selection—although it should be noted that this effect was also observed after anti-CH2 selection (Fig. 2). As the N-glycosylation site involves a Thr or Ser two positions after the Asn, the low tolerance to mutation at T299 in the FcγRI selection confirms the importance of the glycosylation for binding to this receptor. Moreover, comparison of the FcγRI and anti-CH2 selections strongly suggests that D265 (which interacts with the primary N-acetyl glucosamine of the N297-linked glycosylation), V263 and V264 are also involved in FcγRI binding (Fig. 2). This is in agreement with previous studies, showing that replacement of V264 (in human IgG3) or D265 (in human IgG1 and IgG3) impaired FcγRI binding.29–31 Mutation of V264 or D265 to alanine even has been shown to result in increased galactosylation and sialylation.29,30
Additionally, residue N325, which is located in the FG-loop of the CH2 domain in proximity to the hinge region, was considerably more intolerant to mutation in the FcγRI selection (Fig. 2). This was also observed at position P329, although the FcγRI/anti-CH2 ratio was slightly above 1/8. To our knowledge, the effect of N325 on FcγRI binding was not investigated before, but other positions in the FG-loop (A327 and P329) have been assigned a role in binding to this receptor.29
The only residue where the outcome of the present work is contradictory to another study is Q295, which is also located in close proximity to the hinge region and to the carbohydrate. Our study clearly suggests a role for this residue in FcγRI binding (Fig. 2). However, Shields et al. showed that replacement by alanine only results in decreased binding to FcγRIIA, FcγRIIB and FcγRIIIA, but not to FcγRI. One possible explanation may be that mutation to alanine does not reduce the affinity to FcγRI as demonstrated by Shields et al., but other mutations do. As replacement of glutamine by alanine requires two nucleotide changes, this mutation is very rare in a library that was generated by error‐prone PCR. Therefore, this mutation is not taken into account in the present study, possibly explaining the different result.
To our knowledge, this is the first study in which a stability landscape was constructed for an entire domain at single residue resolution. By combining a stability-based in vitro selection with high throughput sequencing, we could analyze the relationship between the sequence and stability of the CH3 domain of human IgG1 without the need for expressing, purifying and measuring each mutant protein individually.
In previous studies, similar approaches have been used for epitope mapping: a protein library was displayed on ribosomes7 or phage8,9 and was selected for ligand binding, followed by sequencing of functional clones. Residues involved in the interaction with the ligand are less tolerant to mutation, thereby allowing the identification of functional epitopes. Alternatively, Chao et al. displayed an EGFR (epidermal growth factor receptor) library on yeast and selected mutants that lost binding to three different monoclonal antibodies.32 Sequencing of the selected pools (in this case, the selected pools contained the “non-binders”) enabled the identification of epitopes.
In those experiments, the library was constructed either by error‐prone PCR7 or by shotgun scanning mutagenesis.8,9 In the latter approach, degenerate oligonucleotides are used, preferentially allowing either the wild-type residue or a defined amino acid to be expressed at particular positions. The advantage of shotgun scanning is the possibility of changing all analyzed positions to the same amino acid (e.g., to alanine). In contrast, application of error‐prone PCR mostly results in mutations that are reached by changing just one nucleotide of a codon. Amino acid mutations requiring two or even three nucleotide replacements are extremely rare. Thus, only a certain set of amino acid mutations, which is dependent on the initial codon, is incorporated. However, this may also be an advantage of error‐prone PCR, as it facilitates analysis of more than just one type of mutation. Moreover, error‐prone PCR allows randomization of a large region in a single step, as opposed to shotgun scanning mutagenesis, which is limited to a small number of positions due to the necessity of using degenerate oligonucleotides.
In the present study, error‐prone PCR allowed us to randomly mutagenize the entire IgG1-Fc gene (220 amino acid positions) and to analyze the effect of a set of mutations at each position. However, in order to increase the significance and reliability of the data, we analyzed only the change in the total mutation rate. This means that, at a specific position, mutations to different residues were not analyzed separately. In spite of this simplification, the obtained stability landscape of the CH3 domain of IgG1 is of high quality, as demonstrated using various approaches. Firstly, evolutionarily conserved residues were significantly less tolerant to mutation, which is in agreement with other studies.10,33 Secondly, the median ΔΔG values (calculated from predicted ΔΔG values of all 19 possible mutations at a specific position) correlated with the data from the stability landscape. Residues with higher median ΔΔG values were less tolerant to mutation. Thirdly, comparison with published data demonstrated that residues, which are important for the stability,16,22–24 as well as for efficient folding of the CH3 domain,25 are highly intolerant to mutation. Finally, the reproducibility of the data was confirmed by the strong correlation of the mutation rate changes obtained from the two separately performed experiments (Fig. 6b).
Another reason for only showing the change in the total mutation rate instead of analyzing each type of mutation individually is the graphic visualization of the data. Depicting all types of mutations in a single diagram would have been very confusing. However, for interested readers, we included such a stability landscape of the CH3 domain, where all types of mutations are analyzed separately, in the supplemental material (Supplemental Fig. 3).
In vitro selection and sequencing of functional protein variants have also been used by other groups for investigating the relationship between sequence and stability.34–36 In these studies, the libraries were constructed by shotgun scanning. However, in contrast to the studies discussed above, the selected positions were completely randomized using NNK or NNS codons (N is a mixture of all four bases; K is a mixture of G and T; S is a mixture of C and G). This strategy allows the analysis of all amino acid substitutions at the targeted positions. However, in order to sample all (or the majority of) possible combinations of mutations and to avoid overlapping effects of too many mutations in one protein variant, these approaches were limited to mutagenesis of 1–6 residues within one library. Moreover, due to the limitation to low sequence numbers that could be analyzed by Sanger sequencing, these high mutation rates at a low number of positions were necessary in order to be able to detect a significant amount of mutations at a certain position.
As outlined in Introduction, this limitation was overcome by application of high throughput sequencing: Fowler et al. resolved the relationship between the sequence of the WW domain and its function (binding to its peptide ligand) at high resolution.10 However, two parameters were found to influence the mutational tolerance of a certain residue: (i) its involvement in binding to the peptide ligand and (ii) its impact on the structure and stability of the domain. Thus, the observed tolerances to mutation are determined by a mixture of two parameters, making it difficult to interpret the result.
In the present study, these mixed influences on the stability landscape of the CH3 domain were avoided by choosing ligands that interact with the CH2. This strategy was enabled by the partial reversibility of the unfolding pathway of IgG1-Fc, as discussed above. As a consequence, the mutational tolerance of residues in the CH3 domain was solely determined by their impact on folding and stability of the CH3, but it was not influenced by interferences with ligand binding.
Apart from general insights into the relationship between the sequence and stability of proteins, this study may also prove to be very useful for protein engineering. For example, IgG1-Fc has been engineered for binding to the tumor antigen Her2/neu or to αvβ3 integrin by mutating the C-terminal structural loops of the CH3 domains.37,38 However, changing these loop sequences resulted in decreased thermal stability of the CH3 domains.37 One of the critical factors determining the fitness of a randomly mutated library is the part of the protein that is chosen for randomization. In this regard, the stability landscape of the CH3 domain might be valuable, as it provides information about the impact of specific residues on the stability of the CH3 domain. Mutating loop regions that are more tolerant to mutation might increase library fitness and enable the selection of stable binders.
The gene of human IgG1-Fc was codon optimized for expression in yeast. As deletions or insertions at homopolymers (stretches of identical nucleotides) have been shown to account for a large fraction of 454 sequencing errors,39 stretches of 5 or more identical bases within the Fc gene were eliminated by introducing silent mutations. For surface expression on Saccharomyces cerevisiae EBY100 (Invitrogen, Carlsbad, CA), the Fc gene was BamHI-NotI-cloned into pYD1 (Invitrogen) C-terminally of the Aga2 subunit of the α-agglutinin receptor, resulting in the vector pYD1-Fc and display of the following construct: Aga2p-GlySerLinker-Xpress-Hinge-CH2-CH3 (Supplemental Fig. 1). Subsequently, the IgG1-Fc gene was amplified by error‐prone PCR using the GeneMorph II Random Mutagenesis Kit (Stratagene, La Jolla, CA) and primers flanking the Fc gene, followed by transformation of S. cerevisiae EBY100 with BamHI-NotI-linearized pYD1 and the gel-purified PCR product using the lithium acetate method.40 Homologous recombination in yeast was facilitated by overlapping regions of the linearized vector and the PCR insert. After transformation, the yeast was cultivated in SD-CAA medium [20 g/l glucose, 0.1 M KH2PO4/K2HPO4 (pH 6), 10 g/l (NH4)2SO4 and 0.1 g/l l-leucine (all from Sigma, St. Louis, MO) and 3.4 g/l yeast nitrogen base and 10 g/l bacto casamino acids (both from Difco, BD, Franklin Lakes, NJ)] at 28 °C.
Two separate selection experiments were performed. The only difference in the experimental setup was the ligand that was used for flow cytometric sorting (FcγRI in the first experiment versus anti-CH2 in the second experiment).
After cultivation of the yeast library in SD-CAA at 28 °C overnight, the culture was set to an OD600 of 1 and incubated in SD-CAA at 28 °C for 4 h. Subsequently, surface expression was induced by centrifugation and resuspension to an OD600 of 1 in SGR-CAA [same as SD-CAA, but 20 g/l galactose and 10 g/l raffinose instead of glucose (both from Sigma)]. Incubation in SGR-CAA was performed at 20 °C for 18–20 h, followed by centrifugation and resuspension in phosphate‐buffered saline (PBS)/bovine serum albumin (BSA) [0.2 g/l KCl, 0.2 g/l KH2PO4, 8 g/l NaCl and 1.15 g/l Na2HPO4 anhydrous + 20 g/l BSA (Sigma)]. The cell suspension was aliquoted into microfuge tubes, followed by incubation at 79 °C shaking at 300 rpm in a thermomixer (Eppendorf, Hamburg, Germany) for 10 min and cooling on ice for 10 min.
In the first experiment, the cells were centrifuged and resuspended in 1 μg/ml His-tagged FcγRI (R&D Systems, Abingdon, UK), followed by a washing step and incubation in 1 μg/ml anti-His antibody labeled with Alexa Fluor 488 (QIAGEN, Venlo, Netherlands) and 5 μg/ml anti-Xpress-APC [anti-Xpress antibody (Invitrogen) conjugated to allophycocyanin (APC) using the LYNX Rapid APC Antibody Conjugation Kit (AbD Serotec, Kidlington, UK)]. In the second experiment, the cells were resuspended in 2 μg/ml fluorescein isothiocyanate isomer 1-labeled anti-human IgG CH2 domain antibody (anti-CH2-FITC, clone MK 1 A6; AbD Serotec) and 5 μg/ml anti-Xpress-APC. All incubation steps were performed in PBS/BSA, shaking at 4 °C for 30 min. After a final washing step, the top 5% of the cells regarding positivity for binding to either FcγRI or anti-CH2 were selected by flow cytometric sorting using an Aria FACS (fluorescence‐activated cell sorting) cell sorter. In case of library characterization, the cells were analyzed using a FACSCanto II (both machines from BD).
The sorted cells were centrifuged and washed with PBS/BSA, and plasmid DNA was isolated from the yeast suspension using the Zymoprep Yeast Plasmid Miniprep Kit II (Zymo Research, Orange, CA) according to the manufacturer's protocol with the following modifications: zymolyase incubation was performed at 37 °C for 60 min. After addition of the neutralization buffer, the suspension was centrifuged for 10 min, followed by an additional centrifugation of the supernatant for 5 min. Elution of plasmid DNA was performed twice with 10 μl H2O in each step; to ensure that the library diversity is not decreased by low efficiency of the plasmid isolation, we estimated the concentration of pYD1-Fc by mixing part of it with pUC19 (lac+) and subsequent blue/white screening as described previously.13
The selected Fc library was amplified by conventional PCR, followed by transformation of EBY100 with the resulting PCR product and the linearized vector as described above.
For high throughput sequencing, plasmid DNA was isolated from yeast using the Zymoprep Yeast Plasmid Miniprep Kit II. For each library, the CH2 and CH3 domains were amplified in separate PCRs, yielding two amplicons per library. The primers comprised a 21‐bp adaptor sequence (A or B for the sense and antisense primers, respectively, necessary for annealing to the beads, which are used in the 454 sequencing technology), followed by a key sequence (TCAG), an MID (multiplex identifier) and a template‐specific part, resulting in the following primer setup: 5′-adaptorA/B-key-MID-template specific sequence-3′. By the incorporation of MIDs into the oligonucleotides, it was possible to mix various amplicons in one sequencing run and to allocate each read to a particular sample according to the MID that was detected. A schematic overview of the primer setup and amplicon generation is shown in Supplemental Fig. 1.
Sequencing was performed in two separate runs: in the first sequencing run, the original, the FcγRI sorted library and Fc-wt were analyzed on one‐sixteenth of a picotiter plate (sequencing was performed by LGC Genomics, Berlin, Germany), whereas in the second experiment, the original, the anti-CH2 selected library and Fc-wt were sequenced on one‐eighth of a picotiter plate (performed at the Center for Medical Research, Medical University Graz, Austria). As the main purpose of this study was the construction of a stability landscape of the CH3, the CH3 domain amplicons were adjusted to higher concentrations compared to the CH2 domain in the sequencing sample of the second experiment (Supplemental Table 1) in order to achieve a higher read number for this domain.
Quality distributions of 454 reads were analyzed using Galaxy,41 and low‐quality reads containing bases with a quality score of < 20 were filtered from the data set. The remaining high‐quality reads were sorted according to the MID sequence and subsequently filtered for reads spanning the entire CH2 or CH3 domain, by checking the presence of both 5′ and 3′ MID sequences. Finally, only reads of the expected size of 391 bases for the CH3 domain and 414 bases for the CH2 domain (including adapters) and without frameshifts were translated into amino acid sequences and further analyzed.
Quality‐filtered library sequences were annealed to the wild-type sequence. At each position X, the sum of mutations to any other amino acid (mutX) was divided by the number of sequences (nseq), yielding the average mutation rate (R) at the position X (RX):
The change in the mutation rate at the position X (CX) was calculated by dividing the mutation rate at the position X in the selected pool (RX,selected) by the mutation rate at the position X in the original library (RX,original):
For comparison of the two selection experiments, the FcγRI/anti-CH2 ratio was calculated by dividing the change in the mutation rate at position X during FcγRI selection (CX,FcγRI) by the change in the mutation rate at position X during anti-CH2 selection (CX,anti-CH2):
For evaluation of the significance of observed differences in the mutation rate changes, the residues were separated into two groups depending on whether or not they fulfill a certain criterion [i.e., (i) being evolutionarily conserved or not and (ii) being located at the interface within 4 Å to the second CH3 domain or not]. Subsequently, the significances of the differences in the mutation rate changes in the two groups were evaluated using a Student's t-Test (Microsoft Excel).
Sequences of the CH2 and CH3 domains of IgG1 were obtained from the International ImMunoGeneTics Information System®† and aligned using Jalview. The following sequences were included in the alignment: IgG1-CH2 domains from Bos taurus (bovine), Canis lupus familiaris (dog), Equus caballus (domestic horse), Homo sapiens (human), Macaca fascicularis (crab‐eating macaque), Macaca mulatta (Rhesus monkey), Mus musculus (house mouse), Ovis aries (domestic sheep), Pan troglodytes (chimpanzee), Rattus norvegicus (Norway rat) and Sus scrofa (pig) and IgG1-CH3 domains from B. taurus (bovine), C. l. familiaris (dog), Cavia porcellus (domestic guinea pig), E. caballus (domestic horse), H. sapiens (human), M. fascicularis (crab‐eating macaque), M. mulatta (Rhesus monkey), M. musculus (house mouse), O. aries (domestic sheep), P. troglodytes (chimpanzee), R. norvegicus (Norway rat) and S. scrofa (pig). Residues, which are identical in all species, were defined as being evolutionarily conserved.
The computer algorithm FoldX19–21 was used to mutate each residue of the CH3 domain to the other 19 amino acids to predict the effect of any possible point mutation on stability in terms of change in free energy of unfolding upon mutation.
First, we repaired the crystal structure of human IgG1-Fc (PDB ID 1OQO) by applying the FoldX command < RepairPDB>, which identifies residues with bad torsion angles or van der Waals clashes. Second, the FoldX command < BuildModel > was applied on the CH3 domains of both chains of the repaired structure. We used this function to exchange every amino acid of the CH3 domain by the other possible 19 amino acids, resulting in values for the change in free energy of unfolding for each mutation at each position (ΔΔG = ΔGmutant − ΔGwild type). To estimate the overall effect of mutation on the stability of IgG1-Fc, we calculated the median of the ΔΔG values for each position in the CH3 domain.
This work was supported by the Christian Doppler Research Association (Christian Doppler Laboratory for Antibody Engineering), the company F-star and the Austrian Science Foundation (FWF W1224; Doctoral Program on Biomolecular Technology of Proteins). M.H. would like to acknowledge support by the BOKU DOC scholarship, and J.G. would like to acknowledge support by the Austrian Genome Research (GEN-AU) Grant “Non-coding RNAs” 820982.
Edited by S. Sidhu