|Home | About | Journals | Submit | Contact Us | Français|
High-resolution homology models are useful in structure-based protein engineering applications, especially when a crystallographic structure is unavailable. Here, we report the development and implementation of RosettaAntibody, a protocol for homology modeling of antibody variable regions. The protocol combines comparative modeling of canonical complementarity determining region (CDR) loop conformations and de novo loop modeling of CDR H3 conformation with simultaneous optimization of VL-VH rigid-body orientation and CDR backbone and side-chain conformations. The protocol was tested on a benchmark of 54 antibody crystal structures. The median root-mean-square-deviation (rmsd) of the antigen binding pocket comprised of all the CDR residues was 1.5 Å with 80% of the targets having an rmsd lower than 2.0 Å. The median backbone heavy atom global rmsd of the CDR H3 loop prediction was 1.6 Å, 1.9 Å, 2.4 Å, 3.1 Å and 6.0 Å for very short (4–6 residues), short (7–9), medium (10–11), long (12–14) and very long (17–22) loops respectively. When the set of ten top-scoring antibody homology models are used in local ensemble docking to antigen, a moderate to high accuracy docking prediction was achieved in seven of fifteen targets. This success in computational docking with high-resolution homology models is encouraging, but challenges still remain in modeling antibody structures for sequences with long H3 loops. This first large-scale antibody-antigen docking study using homology models reveals the level of “functional accuracy” of these structural models towards protein engineering applications.
Therapeutic antibodies are powerful tools for the prevention and treatment of human and infectious disease because of their high affinity and specificity for target antigens. Since the first U.S. Food and Drug Administration approval of a therapeutic antibody over two decades ago, antibody drugs have become available to treat cancer, infectious and cardiovascular diseases, arthritis, inflammation and immune disorders,1 and new therapeutic antibodies are driving further growth of the biotechnology industry. The discovery of high-affinity binders for target antigens is efficiently accomplished by the use of combinatorial libraries in various display platforms such as phage, ribosome, E. coli or yeast,2–4 and structure-based computational techniques complement experimental methods for antibody engineering. Structures of antibodies with their antigens can yield insight into biological phenomena or drug and disease mechanisms, and the increasing sophistication of computational techniques makes it possible to increase antibody-antigen binding affinity5,6 or deduce the structural origin of such affinity maturation.7 However, a crystal structure may not be readily available for most newly developed antibody sequences, in which case a high-resolution antibody homology model is needed to perform structure-based in-silico antibody engineering. Here, we describe the development and implementation of RosettaAntibody, a new protocol for high-resolution homology modeling of antibody variable domains (Fv).
Homology models can be particularly valuable if they are useful for docking or structure-based protein engineering such as increasing stability. Presumably, such downstream applications will be difficult since they require highly accurate structural models so that the atomic interactions across an interface can accurately capture the energetics of binding. Few tests have been published of the ability to use homology modeled structures in downstream predictive applications. The blind docking challenge known as CAPRI8 (Critical Assessment of Prediction of Interactions) has presented several homology modeling plus docking combination targets, with some predictive successes and some failures.9 Because antibodies are well studied structurally, they offer an ideal model system to test homology-model based docking. Template identification is considerably simpler than in generic homology tasks, and antibodies isolate the critical challenge of loop modeling which is likely to be of general interest for docking other proteins.
Whitelegg and Rees10 have authored an exhaustive review that summarizes the main issues in structure prediction of antibody variable domains. Briefly, the framework residues of the light and heavy chain variable domains (VL and VH) serve as the scaffolding on which the six complementarity determining region (CDR) loops are erected. The framework sequence and its β-sheet structure are generally well conserved, and the conformations of the non-H3 CDR loops are usually restricted to canonical structures that may be determined using sequence-based rules.11–13 The difficulties in modeling the CDR H3 loop are well known:12–16 no determinative rules can accurately predict structure from sequence for the entire loop region. Therefore, the H3 conformation is predicted de novo, which is challenging due to the hyper-variability in the sequence and length of the loop.17 At the same time, the continuous increase in the knowledge-base of solved antibody crystal structures aids in the comparative modeling aspects of Fv structure prediction.
Other groups have previously described protocols for Fv homology modeling.18–20 The WAM10 (Web Antibody Modeling) server incorporates the antibody structure knowledge-base into a practical application that has been used by several groups, including our own, to build homology models for use in antigen docking21–25 or to visualize the topographical location of binding site mutations.26 Most of these studies required the use of experimental information such as the effect of site mutagenesis on binding in order to compensate for uncertainties in the model structures.
For use in the prediction of antigen interactions, the requirement that the paratope be modeled accurately makes antibody homology modeling a harder problem than it first appears. The geometry of the antigen binding surface is directly influenced by the particular canonical loop templates used for modeling of non-H3 CDR loops, the VL-VH orientation, and the accuracy of the CDR H3. There are also second-order effects on the binding site through the interactions of the CDR loops L2, L3 and H1 with H3,13 and the influence of the relative orientation of the VL and VH rigid bodies on the placement and interaction of the CDR loops.27
Our main objectives are (1) to combine the existing antibody knowledge base with the high-resolution backbone and side-chain modeling capabilities of the Rosetta structure prediction suite28 to generate high quality antibody Fv homology models and (2) to test those models for computational docking. To address all the relevant degrees of freedom in Fv modeling, we combine template selections with ab initio CDR H3 loop modeling and simultaneous optimization of the CDR loop conformations and the VL-VH orientation. The homology models are tested in antibody-antigen docking predictions, including a newly developed ensemble docking technique29 that mitigates the uncertainty in the homology models by using a multiple-copy representation. To our knowledge, this is the first study to systematically benchmark docking using antibody homology models to delineate modeling aspects that are crucial for accurate prediction of the antigen interaction. Our ultimate goal is to build high-resolution antibody models that may be reliably used in various downstream docking and design applications.
Figure 1 summarizes the antibody homology modeling algorithm which we have named RosettaAntibody. We start with the selection of the VL and VH framework region templates and assembly of the β-barrel, followed by the grafting of the non-H3 CDR loop conformations. The CDR H3 conformation is predicted via loop modeling, and the entire structure is refined using Monte Carlo minimization and the Rosetta all-atom energy function. To isolate the introduction of error in each step of the model building process, we first test the RosettaAntibody protocol for its ability to recover the crystallographic conformation (referred to henceforth as native) of the CDR loops given the native framework backbone (native loop recovery test). Subsequently, we test the ability to assemble a homology model starting from sequence in the absence of any information about the native structure (homology modeling test).
We analyzed the variation in relative VL-VH orientations by performing an hierarchical clustering analysis30 on a set of 230 antibody crystal structures (non-redundant H3 sequences, resolution 2.8 Å or better). To focus on the domain interface, we clustered models based on the Cα rmsd of interfacial VH residues (36–39, 46–49, 84–94 and 103–110; Chothia numbering12) after superposition of the VL domains. At a radius of 2 Å, the structures collapsed into seven clusters. The largest cluster comprised 87% of the structures and consisted of κ-type light chains. λ-type light chains form the second largest cluster with 7% of the structures, due to the markedly different domain orientation in antibodies with this chain type.27
Given the relatively tight clustering of the domain interface orientation, an average β-barrel approximation to represent the relative VL-VH orientation is reasonable.31 However, we follow Dyekjaer and Woods19 and use structural superposition to an antibody crystal structure from a database. That is, for each chain a template structure for the framework region is selected from the database by maximum BLAST32 bit score against the Fv chain sequence. Subsequently, the templates are oriented relative to each other using structural superposition onto a crystal structure with the highest sequence similarity for the entire Fv region. This approach captures the κ-type/λ-type differences and variations within those classes, and it reduces steric clashes across the domain interface that may result from an average β-barrel conformation.
The first section of Table I (VH RMSD) lists the backbone heavy atom (N, C, Cα, O) rmsd-to-native of the VH framework in the homology models created by homologous superposition alone (we will examine VL-VH rigid-body minimization later). The median rmsd value over 54 targets is 1.4 Å, with an rmsd under 2.0 Å for 45 of 54 targets. Most of the deviation arises from the relative VL-VH orientation since the individual VL and VH framework templates themselves have a median rmsd of 0.5 Å relative to the native backbone (data not shown). Sequence similarity between the query and template used for structural superposition varies between 75 and 85%, and the similarity does not have a strong influence on the resulting VH rmsd. Rather, individual residues at the domain interface appear to exert a controlling influence on the relative orientation, creating the relatively small (~2 Å) differences in the orientation between different antibodies.
The VL-VH orientation obtained using template superposition does not remain static during homology modeling, but is modified using rigid-body minimization of the domains subsequent to the low-resolution H3 modeling (see Methods). This step is carried out to eliminate steric clashes across the domain interface that result from threading the query Fv sequence on the framework region templates and typically results in distinct VL-VH orientations being obtained in individual homology models. The recovery of the native VL-VH orientation during homology modeling is extremely challenging not only due to the high-resolution nature of the problem (~ 2 Å), but also because the orientation is conditional on the specific CDR H3 conformation generated during de novo modeling. Therefore, the rigid-body minimization has been designed to eliminate unphysical interactions across the domain interface via minor rigid-body perturbations from the starting β-barrel, without explicitly attempting to recover the native VL-VH orientation.
The first section of Table I also shows the VH framework rmsd (after VL framework superposition) in the homology model with the lowest CDR H3 rmsd-to-native (the low-rmsd model). For 75% of the targets, VL-VH rigid-body minimization changes the VH rigid-body position by 0.5 Å or less compared to the starting orientation in the β-barrel. For these targets, such minor rigid-body movement together with the side-chain repacking is sufficient to resolve high-energy interactions across the domain interface in the starting orientation. For the remaining 25% of the targets, VL-VH rigid-body minimization moves the VH domain by 0.6–2.0 Å. The larger rigid-body change may be necessary either because the starting β-barrel template is sub-optimal for the query antibody sequence or because the inter-domain interactions need to be optimized given the H3 conformations generated during the low-resolution search.
Based on the analysis of antibody crystal structures available at the time, Martin and Thorton14 and Chothia and co-workers12 spelt out the canonical paradigm for predicting non-H3 CDR loop conformations. Typically, the canonical class assignments mirror the loop lengths, and additional classes and sub-classes arise upon considering the identity of specific residue positions within the loop and the framework regions. There have been reports of antibody crystal structures with new canonical conformations that were previously unseen,33 or modifications to the existing rules for assigning canonical classes for individual CDRs.34 Recently, Whitelegg and Rees10 have revisited the issue of canonical assignments using ~100 antibody crystal structures. Their detailed analysis has confirmed the essential robustness of the original canonical rules for a majority of the loop conformations, but has also produced some new insights: in multiple cases, CDR loops were found to be conformational members of a given canonical class without complying in every detail with the sequence constraints (for instance, 11-residue L1 loops). There were also instances of loops complying with the sequence requirements for a particular canonical class, but belonging to an alternate class on conformational grounds, as is the case with the H2 loops of an anti-gamma-interferon antibody (Protein Data Bank35 [PDB] ID 1b2w) and Fab fragment of a human IGG1 kappa autoantibody (1vge). It may also be deduced from their analysis that with limited exceptions, there is essentially a one-to-one correspondence between loop length and the canonical class assignment.
Due to the evidence that the loop length is an appropriate surrogate for the canonical class, we have chosen to identify the correct loop template based on the length alone without resorting to the rules for canonical class assignments. For each non-H3 CDR, the loop conformation to be used as the template for modeling is selected by maximum BLAST32 bit score to the query loop sequence. This loop template is grafted to the VL or VH framework by optimal superposition of the backbone atoms of two overlapping residues at each end of the loop. The BLAST search is performed against the set of known CDR loop conformations of the same length, without restricting the search to loops that belong to the same canonical class as the query. Any global sequence similarity criterion to identify loop templates may mask minor conformational effects, such as the rotation of a single peptide bond, that arise due to the identity of individual CDR or framework residues. To account for this possibility, the later stages of the protocol have been designed to incorporate minimal backbone flexibility in the non-H3 CDR loops so as to relax the loops into a conformation that is dictated by both intra-loop and CDR-framework interactions.
Table 1 and Figure 2 summarize the local and the global backbone heavy atom rmsd values for non-H3 CDR loops in 54 targets. Accurate predictions of the non-H3 CDR conformations to within 1 Å of the crystal conformation are achieved without first considering the canonical class assignments. A low value of the local rmsd – where the loop residue backbone atoms are optimally superposed on the reference loop without regard for the framework location – indicates that an accurate template has been chosen for the loop conformation itself. For CDR L2, the local rmsd is under 0.5 Å for 90% of the targets and under 1 Å for all targets (Figure 2A). For the other non-H3 CDRs, the local rmsd is under 0.5 Å for 60–75% of the targets depending on the CDR and under 1 Å in all but ten individual instances (Table II). These ten instances also fail by canonical classification11–13 suggesting that the high rmsd values are due to a non-canonical conformation in the crystal structure. Extensive backbone minimization of the non-canonical non-H3 loop conformations would be a productive addition in a future version of our Fv modeling protocol.
To test whether our predictions had identified the correct canonical loop class and whether the sequence based rules would have fared better, we examined the targets with high local rmsd (< 1 Å). In several cases, the loops did not match any canonical classification, and predictions might have to be made using sequence similarity to known loop conformations as we have done here. In cases that did match canonical rules, the rmsd of the canonical template was similar to that chosen using the BLAST search (data not shown). The large number of antibody structures in the PDB enables the use of simple database search methods to identify CDR loop conformations in most cases. However, the finite size of the database also imposes limits on the accuracies that are possible for loop lengths for which few database examples are available such as 10 and 12-residue Vκ L1 loops, or CDRs from Vλ light chains. In other cases, the accuracy is limited by the natural variation in the crystal conformation even for loops with identical sequences (for instance 16 and 17-residue L1 loops) such that the loop rmsd depends on the specific template chosen.
In addition to measuring the deviation of the loop itself, it is useful to measure the error in the global placement of the loop after grafting the CDR loop onto the scaffold. The global rmsd is the deviation of the loop residue backbone heavy atoms calculated after superposition of the framework residues, and it indicates the accuracy of the framework and the structural superposition of the loop. Figure 2b shows that when the loop template is grafted onto the native framework from the target crystal structure, the global loop conformation is modeled to an accuracy of 1 Å or better for the majority of cases. Figure 2c shows that grafting the loop template onto a scaffold identified using homology yields similar quality structures, indicating that the additional rmsd deviation from the crystal conformation arise mostly due inaccuracies in grafting the loop to the framework and not due to the choice of the loop itself or errors in the homology framework.
To test our loop building procedure in isolation, we remove the CDR H3 from the crystal structure of the target and attempt to predict its conformation using loop modeling. Loop modeling consists of a low-resolution Monte Carlo-based backbone fragment assembly followed by smaller perturbations to the ϕ, ψ, and χ dihedral angles in high resolution, using cyclic coordinate descent (CCD)36 to close the loops (see Methods). We perform these so-called “native recovery tests” on 54 targets with CDR H3 loops comprising between 4 and 22 loop residues.
We classify the loops as very short (4–6 residues), short (7–9), medium (10–11), long (12–14) and very long (17 and above). In the loop modeling field 11–13 residue loops are considered long,37,38 and accuracies approaching 1–1.5 Å loop rmsd for these lengths have been demonstrated only under ideal loop modeling conditions in which the structure of the backbone and side-chains and crystal contacts outside the loop region are retained from the crystal structure. In recognition of the challenges in modeling long loops, 75% of the targets in this study contain 11 or fewer residues in the CDR H3. However, the CDR H3 length diversity of the murine and human H3 repertoire is larger, with approximately 15% and 40% respectively of the human and murine antibody sequences comprised of Kabat CDR H3 lengths longer than 11 residues.17 To capture this diversity, we have included 14 targets with 12 or more residues in the CDR H3.
For the native recovery tests, the second section of Table I (H3 RMSD) lists the global backbone heavy-atom rmsd of the CDR H3 loop in the lowest all-atom energy model among all conformations (the “low-score model”) and the lowest global CDR H3 rmsd model among the ten lowest scoring models (the “low-rmsd model”). The low-rmsd model is not known in a blind structure prediction exercise, but serves as a useful metric to rate the ability of the protocol to discriminate near-native CDR H3 loop conformations. Histograms showing the accuracy of CDR H3 loop modeling for all loop lengths are shown in Figure 3.
We first present the analysis for loop lengths under 11 residues. For this set, the low-score models have a CDR H3 global rmsd-to-native of 1.25 Å or better for 16 of 40 targets, and the rmsd is under 2 Å for 30 targets (Figure 3a). If the pool is expanded to the ten lowest scoring conformations, 26 targets (~65%) have a rmsd under 1.25 Å and 37 targets (~90%) have a model with rmsd under 2 Å (Figure 3b). The median rmsd values of the predictions increase with loop length. For the low-score models, the median rmsd values are 1.0 Å for the very short loops, 1.4 Å for the short loops and 1.7 Å for the medium length loops. The corresponding values for the low-rmsd models are 0.6, 0.8 and 1.1 Å respectively.
For several targets, the loop modeling algorithm samples adequately, but the final discrimination of near-native CDR H3 loop conformations is limited by errors in the scoring function. For instance, of the ten targets that have rmsd < 1.25 Å for the low-rmsd models but not the low-score model, six produce over 15% of structures with rmsd under 1.25 Å in the 200 top-scoring homology models. In fact, for some of these targets the incorrect models score better than the native crystal structures. In other targets, sampling is a limiting issue, including two targets with an extended H3 base conformation (1bql and 1qbl) and five loops (1kem, 2jel, 2adf, 1kb5 and 2aju) that have so-called “atypical conformations” as defined by Whitelegg & Rees.10 None of the CDR H3 backbone dihedral angles in these targets are in disallowed regions of the Ramachandran plot. Low sampling is probably due to poor representation of fragments required to create a CDR H3 loop with atypical conformation in the pool of allowable fragments. Subsequent VL-VH rigid-body minimization improves the sampling of decoys with CDR H3 global rmsd-to-native under 1.25 Å for targets 2jel and 1kb5 (but not the final discrimination of the near-native loop conformations), suggesting that the lack of sampling diversity in the native recovery tests might be due in part to unfavorable interactions across the domain interface in the target crystal structures.
None of the 14 targets with 12 residue or longer CDR H3 loops has a low-score or low-rmsd model with CDR H3 global rmsd under 1.25 Å. Two 12-residue loops (2g5b and 2fjh) and one 14-residue loop (1bj1) have low-score models with rmsd < 2.0 Å. If the low-rmsd models are considered, all long-loop targets except 1fbi have a model under 3 Å, with a median of 2.5 Å.
For very-long loop targets 1hzh (length 18) and 1g9m (length 19), the low-score and low-rmsd models coincide and have an rmsd value of 2.7 Å and 2.3 Å respectively. This is an impressive result, considering the conformational space that is potentially sampled by a loop of this length.
Having benchmarked the accuracy of H3 loop modeling under ideal conditions, we now combine the three modules of β-barrel assembly, CDR grafting and H3 modeling to emulate a blind prediction of the complete Fv structure. The β-barrel and the non-H3 CDR loops are modeled using homologous template structures to create the acceptor scaffold, and the CDR H3 is built on this scaffold. Furthermore, the homology model subsequently undergoes backbone minimization of the non-H3 CDR loops and rigid-body movement of the VL and VH domains relative to each other. Therefore, the conformations of these regions in the final homology model are different from that in the starting acceptor scaffold.
The second section of Table I (H3 RMSD) lists values of the CDR H3 global backbone heavy atom rmsd values for complete homology models. Of the 40 targets with 11-residue or shorter CDR H3 loops, six targets (compared to sixteen in the native recovery test) have a low-score model with a CDR H3 rmsd-to-native of 1.25 Å better, and 20 targets (50%) have an rmsd better than 2 Å. If the low-rmsd models are considered, 31 targets (compared to 37 in native recovery) return a prediction better than 2.0 Å. Although the accuracy clearly suffers from the scaffold errors, these results are encouraging given that no aspect of the backbone or side chain conformations of the target crystal structure is used during the modeling.
For the 10 targets with loop lengths 12–14, the rmsd values of the low-score and low-rmsd models ranges from 1.6 to 5.8 Å. For the very-long loop targets, the rmsd of the low-score models range from 4.6 Å for target 1g9m to 13.6 Å for target 1f58. Three of the four very long-loop targets have a low-rmsd model with an accuracy of 3.8 Å or better.
Eighteen targets with CDR H3 lengths between 7 and 11 inclusive are also featured in the original WAM study.31 In the WAM publication, the final H3 loop conformation was chosen using one of so-called VFF, accessibility or rmsd screens. The rmsd screen computes the local rmsd between the loop model and known crystal structure H3 loop conformations of the same length and combines this with the sequence similarity of the two loops to rank the loop model. In ten of the eighteen targets that overlap between the respective studies, the CDR H3 rmsd in the rmsd-screened WAM31 model is lower than or equal to that of the low-score RosettaAntibody model. However, in two-thirds of the targets, there is at least one prediction among the ten best RosettaAntibody models that has an rmsd lower than or equal to that of the WAM prediction. These results suggest that an improved strategy to discriminate near-native conformations among the ten best models either by improvements to the scoring function or by the use of a WAM-type rmsd-screen can significantly enhance the accuracy of RosettaAntibody’s final prediction.
Figure 4 provides representative illustrations of high, medium and lower quality successful loop predictions obtained from homology modeling. The model for the 9-residue loop in D3H44 Fab (1jpt, Figure 4A) has a backbone rmsd value of 0.9 Å, and the high prediction quality is evident from the accurate superposition of the backbone and side chains in the homology model and the crystal structure. For jel42 Fab (2jel, Figure 4B), the prediction has an rmsd of 1.4 Å. In this medium-quality prediction, all backbone atoms are accurately positioned relative to the crystal except those of the glutamate and glutamine residues at the apex. For antibody 2H1 (2h1p, Figure 4C), an 11-residue loop target, the rmsd of the predicted loop is 1.7 Å. The loop model is faithful to the contours of the crystal loop conformation, but the backbone inaccuracies are amplified in the positioning of the side chains.
While local and global loop rmsd measures speak to the accuracy of structure prediction itself, the binding surface backbone heavy atom rmsd gauges the suitability of the model for applications involving antigen interactions. This rmsd is computed over the CDR residues after optimal superposition of all CDR loops, thus incorporating information in the CDR conformations within each domain as well as the relative domain orientation. The third section of Table I lists values of the binding surface rmsd for homology models for all the targets. The lowest scoring homology models have a median rmsd value of 1.5 Å and the binding surface is modeled to an accuracy of 2 Å or better in 80% of the 54 targets.
The accuracy with which RosettaAntibody predicts the geometry of the antigen binding surface is shown in Figure 5 for two targets. The figure shows all six CDR loops structurally superposed with the CDR loops in the crystal structure for the ten lowest scoring RosettaAntibody models. The CDR H3 is located at the center of the binding surface, and the loop conformation clearly varies among the different models. The figure also highlights the spread in the positioning of the non-H3 CDR loops resulting both from the VL-VH rigid-body minimization and the conformational variability due to direct minimization of these loops. Such variability may compensate for uncertainties in the modeling process.
An important motivation driving high-resolution homology modeling of antibodies is that the resulting models are useful for downstream applications such as antibody-antigen docking. Here, we have benchmarked the performance of RosettaAntibody models in local antibody-antigen docking of fifteen targets with known complex structures. Local docking, where starting positions are generated from random perturbations from an input structure, attempts to identify a near-native complex conformation when sampling conformation space up to 30 Å rmsd (ligand backbone heavy atom) from the starting configuration. Local docking is often used to make blind structural predictions9,39,40 of an antibody-antigen complex when some biological information is known about interacting residues (e.g. mutational information) to provide a reasonable estimate of the binding orientation,40 and its success is a necessary condition for successful global docking.41
The docking studies were carried out using RosettaDock,41 a Monte Carlo-based docking algorithm that simultaneously optimizes rigid-body position and side-chain conformation. RosettaDock has been used successfully by several groups in the CAPRI (Critical Assessment of PRediction of Interactions) blind docking challenge.9,40,42,43 As an input to the program, we use the unbound structure of the antigen except in cases where such a structure was unavailable in the PDB, in which case we use the bound backbone coordinates. The docking models are assigned CAPRI-style accuracy ratings of “high”, “medium”, “acceptable” or “incorrect” that depend on the rmsd-to-native of the ligand position, the interface rmsd to native and the fraction of native residue-residue contacts that are recovered in the model (see Methods).44 It is typically challenging to achieve high-quality docking predictions, especially with homology models and in the absence of biological information regarding the interaction site of the protein partners. For instance, in the latest CAPRI rounds (6 through 12) predictor groups worldwide produced a high-quality prediction for only one of seven targets, and medium-quality predictions for three others.45
Consistent with the CAPRI assessment criteria, we report the quality of a prediction as that of the most accurate structure among the ten lowest scoring docking decoys. Additionally, we stipulate that the docking run produces an ‘energy funnel’ if at least five decoys of medium or high quality are present in the ten lowest scoring decoys. A funnel indicates convergence of the docking simulation towards the native complexed structure.46 Creation of one model with prediction quality of at least ‘medium’ among the ten top-scoring docking models is a fairly generous criterion for docking success; the presence of an energy funnel is a conservative criterion. Table III presents a summary of the docking results, and the docking results of a monocolonal antibody complexed with surface antigen 1 of Toxoplasma gondii (1ynt) are used throughout as an example.
The results of docking using antibody and antigen crystal structures (bound structure if unbound is unavailable) represent the theoretical maximum of RosettaDock’s prediction accuracy for a given complex. Figure 6a illustrates a typical result for crystal structure docking using a plot of interface score vs. ligand rmsd for target 1ynt. The presence of an energy funnel is evident in the figure as nine of the ten lowest energy structures are of medium or high quality. Thirteen of the fifteen targets in our docking benchmark returned a medium or high quality prediction and seven of these cases exhibited an energy funnel. These results corroborate previous results47,48 that while significant challenges still exist, accurate antibody-antigen docking using crystal structures is feasible and is a currently an active area of research.
Subsequent docking simulations, carried out using the native recovery model with the CDR H3 loop modeled de novo or with the RosettaAntibody models that retain no aspect of the native antibody conformation, reveal the extent to which deviations from the antibody crystal structure decrease docking accuracy. With the lowest scoring native-recovery model, we obtained six medium-quality and one high-quality prediction with two energy funnels, compared to seven funnels for crystal structure docking. While docking of target 1ynt with the native recovery model produces an energy funnel (Figure 6b), the best model among the ten lowest energy structures is of medium quality, compared to the high quality prediction obtained in crystal structure docking. Docking with the lowest scoring RosettaAntibody model results in lower quality predictions than those obtained using the lowest scoring native recovery models; a medium quality docking prediction was obtained for five targets, but none of the targets returned a funnel. Docking using the WAM models yields similar results with four predictions of medium quality and one funnel. For target 1ynt, docking using the RosettaAntibody model (Figure 6d) shows comparable results to that using the WAM model (Figure 6c): both produce a medium quality prediction but lack an energy funnel. These results underscore the challenges of using even high resolution, moderately accurate structural models towards functional ends such as antibody docking.
It is of interest to investigate whether docking would have been more successful if the antigen were docked to the low-rmsd model (instead of the low-score model) from the native recovery and homology modeling simulations. Such a case would be relevant if improvements in the energy function allowed the correct identification of the closest model. Docking with the low-rmsd native recovery model produces one extra high-quality prediction, but the results are otherwise similar to docking with the low-score model. Docking with the low-rmsd RosettaAntibody model not only results in one extra high and medium-quality prediction each, but also produces three funnels where there were none in the docking with the low-score models. The results for 1ynt using the low-score and low-rmsd RosettaAntibody models (Figure 6d and 6e) reflect this trend. The low-score model achieves a medium quality prediction without a funnel, while the low-rmsd model achieves a high-quality prediction with an energy funnel. The success of the low-rmsd models suggests that for several targets, structural models exist among the ten low-scoring RosettaAntibody models that are accurate enough for downstream functional applications.
A typical real-world antibody engineering application may involve starting with the sequence of a new antibody for which a crystal structure is unavailable, building a homology model from sequence, and using this to predict the paratope-epitope interactions with the antigen via computational docking. In this scenario, the low-rmsd antibody model is not known a priori. One may perform docking with a single antibody model (for instance, the low-score model), in which case the docking accuracy may be limited by the quality of the CDR H3 loop prediction in the particular instance. Alternatively, one may use multiple antibody models (for instance, the ten top-scoring models) in independent docking runs, with a proportional increase in the computational cost and the difficulty of discriminating the near-native docking models by comparing the independent runs.
Our lab recently developed an ensemble docking method that can dock multiple input structures simultaneously in a single docking run.29 The method was shown to outperform docking protocols that were based on a single-copy representation of the interacting proteins, but without a proportional increase in the computational cost. In the specific application to antibody-antigen docking, the ensemble method allows us to dock the ten (or more) top-scoring antibody models, and is especially useful in situations where an antibody model with a near-native H3 loop conformation is among the ten top-scoring models, but is not the low-score model.
Ensemble docking using the set of the ten top-scoring antibody models produces two high-quality and five medium-quality predictions, together with funnels for three targets. Ensemble docking thus achieves an overall accuracy similar to that obtained using the low-rmsd RosettaAntibody model, but does so even in cases where the identity of the low-score and low-rmsd models do not coincide. These results are also an obvious improvement over those obtained using the single low-scoring RosettaAntibody or WAM models (five and four medium-quality predictions respectively).
Figure 7 depicts the medium-accuracy ensemble docking prediction (ranked 8th by interaction energy) of HyHEL-5 Fab-bobwhite quail lysozyme complex (1bql). This model emerged from the docking of the ensemble of RosettaAntibody models to the unbound antigen (1dkj); the latter deviates by 0.63 Å from its bound form. The antibody homology model selected from the ensemble during docking is the one with the third-lowest H3 rmsd to the native structure. In the docking model, the ligand rmsd-to-native in the docking model is 1.81 Å, the interface rmsd is 1.36 Å, and 76% of the native residue-residue contacts are recovered. Interestingly, docking with the bound antibody crystal and the unbound antigen achieves similar interface rmsd and native contacts, but a poorer ligand rmsd (4.4 Å). A backbone conformational change of the CDR loops is required to relieve the steric clashes that occur when the unbound lysozyme is superposed on the bound lysozyme in the complex. Indeed, in the antibody homology model in the predicted complex structure, the CDR L3 and the apex of CDR H3 move away from their respective bound conformation to avoid the clash, and the CDR H3 loop conformation is 1.9 Å rmsd from the native structure.
We have developed an antibody modeling protocol which is accurate enough to enable successful antibody-antigen docking starting with the antigen structure and the antibody sequence alone. This approach will be useful for new therapeutic antibodies for which experimentally determined coordinates are not available, as docking models can help identify epitopes, and structure-based designs can further engineer affinity or selectivity. Further, while antibodies are important to the biotechnology and pharmaceutical industry, the docking of homology models is of broad interest to biology in general. This study on antibodies serves as a model of the potential for more routine use of homology models in docking and design applications. Of course, more general applications will not be able to leverage the extensive database of structures and knowledge of antibody structures as we have done here.
We have exploited the larger antibody structural database available and combined several structural modeling techniques to create high-resolution antibody models. The antigen binding surface is collectively determined by the conformation of the CDRs as well as the relative orientation of the variable domains, and the combination of CDR loop modeling and rigid-body minimization of the entire variable region allows fine control over the geometry of the binding surface. An important contribution of this work is a detailed accounting of the accuracy and error associated with each modular task in the homology modeling process, providing benchmarks for measuring future efforts to improve antibody structural accuracy. The global conformation of the non-H3 CDRs is predicted to an accuracy of 1 Å or better in about 70% of the targets. The deviations of the CDR global conformation in the homology models usually arise as a result of the incompatibility of combining loop and framework templates from different crystal structures. In a few cases, the target CDR adopts a non-canonical crystal structure conformation in which case comparative modeling based on a database search may not produce an accurate outcome.
One of the major limitations of antibody modeling remains loop modeling, both for accuracy and computational cost. Recent advances in conformational sampling37,38 make it possible to consistently predict the conformations of 11–13 residue loops with accuracy better than 1.5 Å. These results were obtained in simulations in which the protein side-chains except those located on the loop retained the native conformation, and the effect of crystal contacts on the loop conformation was included, although there is evidence that dropping the latter advantage does not decrease the prediction accuracy.49 A fairer comparison to the results from our native-recovery simulations results is obtained by considering a recent loop-modeling study using the LoopBuilder protocol on a set of targets different from the ones in our benchmark. The median rmsd of the low-score native recovery models in this study is equal to or lower than the LoopBuilder values for 8-residue and 10-residue loops and larger than the LoopBuilder value for 9 and 11-residue loops. However, the median values of the low-rmsd native recovery model (0.7, 0.8, 1.05 and 1.35 Å for loop lengths eight through eleven) are lower than the corresponding LoopBuilder value for all loop lengths.
A second limitation arises from the accuracy of the energy function, as evidenced by the fact that the low-rmsd models are not always the top-scoring models. Long loops that extend into solution may require a better solvation model. Antibody interfaces are more charged than others,50 and electrostatic energies are difficult to calculate accurately.51 Specifically, the base of the CDR H3 loop frequently has charged residues,17 resulting in dominant inter- & intra-chain electrostatic interactions.52 Because antibody binding regions can be more charged and contain an excess of tyrosine and tryptophan residues, the homology modeling and docking decoys (available upon request) would be useful for testing improved electrostatic models and better aromatic representations.
The antibody-antigen docking performance compares well to that in the latest rounds (Rounds 6–12) of the CAPRI challenge, where predictor groups worldwide produced high-accuracy prediction for only one of seven targets and only three targets received medium or acceptable quality predictions.45 In the recent rounds, the two structures which required homology modeling of at least one docking partner had only acceptable quality docking predictions.45 In this study, the antibody predictions are able to exploit more homology information than the CAPRI targets, and the docking has been limited to explorations of the local docking funnel, but the results are encouraging for the first large-scale docking study using antibody homology models. Docking using the low-rmsd RosettaAntibody models is successful for seven of the fifteen docking targets. Since the low-rmsd model is not known a priori and given the difficulties of achieving high-resolution accuracy in a homology model, an alternative approach is to generate multiple models to capture the extent of the uncertainty. In cases presented here, using the ten top-scoring models was sufficient to significantly improve docking accuracy through ensemble docking. Many flexible backbone docking methods are under active development53 and may eventually enable explorations of more dimensions of the uncertainty of a homology model, such as the H2 loops which often interact with the antibody and the long L1 loops that were difficult to model accurately with grafting.
It would be useful for scientists who use the RosettaAntibody protocol in blind, predictive applications to estimate the confidence expected in docking predictions. We have observed a weak correlation in the docking accuracy (by ligand rmsd) with the mean sequence identity of the non-H3 CDRs and VH-VL templates and the CDR H3 length (Pearson test, significance level, α=0.0554). Targets with template sequence identities below 90% and/or CDR H3 loop lengths of 10 or above have a less than 35% chance of producing one of the ten top-ranked docking models with a ligand rmsd below 8 Å. These conclusions have been made on the small dataset of fifteen targets, so these trends should be interpreted with caution.
The antigen docking results show that loop rmsd is not the sole indicator of the “functional accuracy” of a homology model. The functional accuracy of a homology model in docking simulations varies by target and is determined by some of the canonical loop conformations as well as the H3 modeling accuracy. Each modeling step is interdependent and the overall model quality is a complex function of the multiple structural elements. These findings are likely to apply beyond antibody modeling to homology modeling in general. That is, the value in homology models is the utility of the model for drawing scientific conclusions. Such utility could be measured by the ability to predict the effect of mutations or the ability to predict binding partners and complexed conformations, in addition to general inference of function.55 The comparative modeling division of the Critical Assessment of Structure Prediction has recently achieved some encouraging cases of rmsd improvement on the template rmsds,56 but as structure prediction techniques become more physically realistic, a more revealing measure might be an assessment of the functional accuracy of structural models.
We have introduced an antibody homology method that is more physically realistic than previous treatments31 through the use of an updated antibody database and a combination of interdependent protein structure prediction techniques including loop modeling, docking, and high-resolution refinement. The models were demonstrated to be useful for antigen docking, particularly when an ensemble of antibody models was used to provide robustness. RosettaAntibody models promise to widen the applicable targets for practical structure-based antibody engineering tasks such as predicting structural changes due to antigen binding,57 predicting the structural consequences of antibody humanization58,59 including those related to the change in the VL-VH orientation,60 dissecting structural aspects relevant to antibody cross-reactivity and specificity,61,62 and improving antibody affinity.63,64
From the approximately 700 antibody structures in the Summary of Antibody Crystal Structures (SACS) compilation67 (http://www.bioinf.org.uk/abs/sacs), we curated a subset of 569 structures (502 VL-VH dimers, 50 VL dimers and 17 VH dimers) by eliminating NMR structures and crystal structures with resolution above 3.5 Å. When multiple copies of structures were available, we arbitrarily chose one copy. The resulting set serves as the primary knowledge-base for the comparative modeling aspects of the protocol such as the framework and CDR template identification and CDR H3 kink modeling.
For this study, antibody residues are numbered according to the Chothia numbering scheme.12 VL CDR loops L1-L3 are constituted by residues 24–34, 50–56 and 89–97, respectively, and VH CDR loops H1-H3 are constituted by residues 26–35, 50–56 and 95–102, respectively. Residues outside the CDR definitions constitute the framework regions.
A new set of 54 antibody crystal structures serves as the target benchmark for testing the protocol including 18 targets featured in the original WAM study.31 Isolated antibody structures were chosen if possible, but some antibody structures were extracted from the co-crystal structure of the antigen-bound complex. The set of 54 antibodies covers a wide range of CDR H3 loop lengths (4–22 residues) to illustrate the scope and limitations of loop modeling. To emulate blind prediction, when database information is used in modeling, only non-related (less than 90% sequence identity) antibody structures in the PDB were used.
Figure 1 depicts the flowchart for the steps in the RosettaAntibody protocol. The high-level steps are:
Steps 5 and 6 are more fully detailed below.
We use fragment insertion for loop modeling of the CDR H3 loop, where three residue fragments are sets of backbone dihedral angles (ϕ,ψ,ω) obtained from known protein structures of similar sequence and secondary structure with idealized bond lengths and angles.28,71 Fragments from H3 loops may capture the dihedral angle preferences of the H3 loop and the steric restraints due to its immediate environment. However, the Rosetta fragment database was originally constructed using a non-redundant set of protein sequences.71 The high degree of overall sequence similarity of antibodies masked the variation in the H3 sequence resulting in few database fragments from H3 loops. Therefore for CDR H3 modeling, we augmented the fragment database using the H3 loops in the set of 230 antibody structures with non-redundant H3 sequences.
The H3 C-terminus residues n-2 through n+1, or base, often exhibits a “kinked” conformation (pseudo-dihedral angle of the four Cα atoms approximately between −10° and 70°).16 We use sequence-based rules16 to detect likely kinked loops, and then we build identified loops using a special fragment library of kinked conformations from the antibody structure database, preferentially choosing fragments from loops of similar lengths or base sequence. For extended loops (pseudo-dihedral angle between 125° and 185°), we model the base using standard 3-residue Rosetta fragments.
The loop modeling strategy is a modified version of that of Rohl et al.,72 similar in spirit to that of Wang et al.73 Since several of the details differ, we provide specifics of the algorithm in the following sections. Most notably, CDR H3 loop modeling is coupled to simultaneous optimization of non-H3 CDR conformations and refinement of the relative VL-VH orientation, which differs from previous loop modeling efforts. Also, the new loop modeling algorithm is approximately five times faster than the implementation by Wang et al.73
Initially, a low-resolution Monte Carlo algorithm of fragment insertion moves generates an H3 loop library of 2000 conformations while holding constant the conformations of the non-H3 CDR loops and the VL-VH orientation. All residues are represented by the four backbone atoms (N, Cα, C and O) and one centroid pseudo-atom for the side chain.71 First, the backbone dihedral angles of the H3 base residues are set equal to those of a random kinked or extended fragment. Next, cycles of Monte Carlo moves attempt 25 fragment insertions in each 3-residue window for residues 1 through n-3. CCD moves close the loops after fragment insertion. The loop conformation with the best energy is recovered at the end of the Monte Carlo simulation. A final round of CCD, with a maximum tolerance of 0.01 Å for the gap at the loop discontinuity, closes the loop. If a gap persists, the protocol is repeated starting with the fragment insertion.
Upon successful completion of the low-resolution search, the side-chain conformations of the residues are recovered from the template structures and the side chains of the H3 loop and its neighbors (8.0 Å cutoff) are repacked with a simulated annealing rotamer packing approach.74 A 50-cycle Monte Carlo minimization docking procedure41 optimizes the VL-VH relative orientation by moving the VH rigid-body via random translations of mean 0.1 Å in each Cartesian direction and by random rotations of mean 2.0° around each Cartesian axis. Subsequently, gradient-based minimization28,73 simultaneously alters the conformations of the non-H3 CDR loops and the VL-VH rigid-body orientation using a fold-tree graph to control propagation of structural changes.73,75 Finally, the backbone and side-chain positions of the CDR H3 loop are refined with an all-atom protein representation in an environment where the non-H3 CDR loops and the VL-VH interface are held fixed. The backbone (ϕ, ψ, ω) and side-chain dihedral angles (χ) of the loop and its neighbors are perturbed in a refinement cycle comprising of small, shear and CCD moves (five each), followed by a single instance of explicit gradient-based minimization.28 The small, shear and CCD moves alter only the loop backbone. The minimization allows additional backbone flexibility in the two residues that flank the loop on each side. The loop apex is perturbed to a greater extent by this combination of moves, while the effect is minor on the stem region. The structure with the best Rosetta energy obtained during the course of the simulation is subjected to a final round of minimization.
Antibody-antigen docking methodology followed that as described by Gray et al. and Wang et al.39,41,76 Ensemble docking followed the conformation selection docking method as described by Chaudhury & Gray.29 For local docking, partners are structurally superposed on the co-crystal structure of the antibody-antigen complex, and the coordinates of the antigen are perturbed to create 1000 docking models that populate a cloud of points 30–50 Å ligand rmsd about the starting position. While docking with the ensemble of the ten low-scoring RosettaAntibody models, to allow an average of 1000 docking models corresponding to each conformer in the ensemble, 10,000 docking models are generated for each target. Interface energy73 is used to rank and discriminate the structures produced by the docking simulations.
Ligand rmsd is the deviation of the N, Cα, C and O backbone atoms of the antigen after superposition of the antibody backbone atoms. Interface rmsd is the deviation of the backbone atoms at the interface after optimal superposition of those same atoms, where the interface is defined as all residues within 10 Å of a non-hydrogen atom from the other docking partner. Interface score is the component of the total docking score that arises from residue-residue contacts at the antibody-antigen interface. Residue-residue contacts exist when a residue is within 5 Å of a non-hydrogen atom from the other docking partner. The docking models are assigned CAPRI77-style “high”, “medium”, “acceptable” or “incorrect” rankings that depend on the rmsd-to-native of the ligand position, the interface rmsd to native and the fraction of native residue-residue contacts that are recovered in the docking model.44
The antibody homology modeling methods presented here are freely available for academic and non-profit use as part of the Rosetta structure prediction suite (http://www.rosettacommons.org). The distribution includes the antibody databases, supporting scripts, documentation and full source code.
We have also implemented a web server (http://antibody.graylab.jhu.edu) which implements the algorithm to predict antibody structures from sequence. The server returns BLAST hits for each framework chain and CDR loop as well as the final ten top-scoring antibody homology models after complete refinement including VL-VH docking.
We thank Eric Kim for developing the antibody structure prediction server and David Masica for comments on the manuscript. This work was supported by National Institute of Health (NIH) grant GM078221. Protein structures were created using PyMOL,65 plots were generated using the R statistical package66 and rmsd was calculated using ProFit (Martin, A. C. R., http://www.bioinf.org.uk/software/profit).