|Home | About | Journals | Submit | Contact Us | Français|
Accurate in silico models for the quantitative prediction of the activity of G protein-coupled receptor (GPCR) ligands would greatly facilitate the process of drug discovery and development. Several methodologies have been developed based on the properties of the ligands, the direct study of the receptor-ligand interactions, or a combination of both approaches. Ligand-based three-dimensional quantitative structure-activity relationships (3D-QSAR) techniques, not requiring knowledge of the receptor structure, have been historically the first to be applied to the prediction of the activity of GPCR ligands. They are generally endowed with robustness and good ranking ability; however they are highly dependent on training sets. Structure-based techniques generally do not provide the level of accuracy necessary to yield meaningful rankings when applied to GPCR homology models. However, they are essentially independent from training sets and have a sufficient level of accuracy to allow an effective discrimination between binders and nonbinders, thus qualifying as viable lead discovery tools. The combination of ligand and structure-based methodologies in the form of receptor-based 3D-QSAR and ligand and structure-based consensus models results in robust and accurate quantitative predictions. The contribution of the structure-based component to these combined approaches is expected to become more substantial and effective in the future, as more sophisticated scoring functions are developed and more detailed structural information on GPCRs is gathered.
GPCRs are a large superfamily of membrane-bound proteins that, upon recognition of specific extracellular stimuli, give rise to a complex signaling cascade generally starting with the activation of heterotrimeric G proteins. Typically, the stimuli are provided by extracellular endogenous ligands, such as neurotransmitters and hormones, exogenous ligands, such as odorants and tastes, or light photons, as it applies to the case of rhodopsin . Due to the plethora of physiological implications of their signaling, GPCR constitute the single protein family most targeted by the currently marketed drugs .
All GPCRs share a common structure, consisting of a single polypeptide chain that transverses seven times the cell membrane with seven alpha-helical transmembrane domains (TMs). Despite the large number of GPCRs, for many years direct structural information was available only for bovine rhodopsin, mainly through the 2D projection map disclosed by Schertler and coworkers in 1993 and the three-dimensional (3D) crystal structure solved for the first time by Palczewski and coworkers in 2000 [3,4]. Thus, structure-function studies of GPCRs relied heavily on sequence and phylogenetic analyses, rhodopsin-based homology modeling, and docking studies supported by site-directed mutagenesis and ligand structure-activity relationships (SAR) data [5,6]. In this context, experimentally validated GPCR homology models have proven to be valuable tools for lead identification, and the scientific literature flourished with successful rational drug design and virtual screening examples [7–11]. In 2007 Kobilka and coworkers unveiled the 3D crystal structure of the human β2-adrenergic receptor (β2-AR), finally providing the long awaited proof that the structure of GPCRs generally resembles that of rhodopsin [12–15]. Comparison between rhodopsin-based models of the β2-AR in complex with the inverse agonist carazolol and the corresponding crystal structure supports and encourages the applicability of GPCR homology modeling and molecular docking to computer-aided drug discovery, at least for qualitative purposes .
However, computer-assisted optimization of lead compounds toward the design of molecules with improved affinity is not a straightforward matter, as it requires models capable of quantitatively calculating and ranking the activity of series of analogs. Predicting with accuracy free binding energy and/or potencies of ligands is in general a daunting task. Due to the paucity of direct structural information, the quantitative prediction of the biological activity of GPCR ligands presents even greater difficulties. A number of approaches have been proposed to address this challenging issue: most of them rely on ligand-based QSAR, while others attempt a direct calculation of receptor-ligand interactions.
Under the label of ligand-based 3D-QSAR are enlisted several QSAR methodologies that attempt to correlate the biological activities of molecules with a number of location-dependent measures that describe their molecular properties, without any explicit calculation of their interactions with the targets [17,18].
Among the various 3D-QSAR approaches, Comparative Molecular Field Analysis (CoMFA) is one of the earliest examples and perhaps still the most widespread. Introduced in 1988 by Cramer and coworkers at Tripos, CoMFA relies on a 3D description of molecular properties based on the analysis of molecular fields detected by a probe atom and mapped on a grid . Fundamental to the development and success of CoMFA has been the introduction of the Partial Least Square (PLS) statistical approach applied to chemistry by Wold in 1984, which allows meaningful correlations in situations in which the variables outnumber the observations . A CoMFA study typically analyzes electrostatic fields, in the form of a Coulombic function, and steric fields, in the form of a Lennard-Jones function . In 1994, Klebe and coworkers introduced the Comparative Molecular Similarity Indices Analysis (CoMSIA) approach, in which molecular fields are expressed in the form of Gaussian-type functions. To attempt an account of the entropic contribution to receptor-ligand binding, alongside steric and electrostatic fields CoMSIA considers also hydrophobic fields .
Before the disclosure of the 3D structure of rhodopsin, the receptor-based design of GPCR ligands was not even feasible through homology modeling. Thus, many researchers turned to 3D-QSAR in an attempt to shed light on the still vague receptor-ligand interactions. An article published by Greco and coworkers in 1991 is probably the first account of the applicability of CoMFA to GPCRs . In their paper, the authors described the application of CoMFA to the study of a set of non-congeneric agonists of the muscarinic receptors, which led to a model consistent with the postulated mechanism of interaction. In the following years, CoMFA and CoMSIA continued to be successfully applied to the prediction of the activities of GPCR ligands. The publication of the crystal structure of rhodopsin gave a new impetus to 3D-QSAR studies, as receptor docking became a practical way of superimposing ligands in the 3D space – a special section of this review is dedicated to combined ligand and structure-based modeling. Among GPCRs, the adenosine receptors have been extensively studied by means of CoMFA and CoMSIA. The first paper featuring the description of a CoMFA study applied to adenosine receptors was published by Jacobson and coworkers in 1995 and dealt with the 3D-QSAR of a series of N6-benzyladenosine-5’-uronamide analogs as agonists of the A3 subtype . Many similar studies followed in subsequent years, conducted by the same group as well as various other laboratories [24–37]. In this context, in 2007 Kim and Jacobson developed CoMFA and CoMSIA models based on structurally diverse adenosine analogues and successfully elucidated not only the molecular basis for their affinity but also those for their relative efficacy at the human A3 receptor .
As an alternative to CoMFA, the GRID/GOLPE combination has also been successfully used to generate predictive 3D-QSAR models. The program GRID, published by Goodford in 1985, allows a thorough exploration of the molecular interaction fields with a large number of different molecular probes . The descriptors produced with GRID can then be subjected to chemometric analysis with the program GOLPE (General Optimal Linear PLS Estimation), developed by Clementi and coworkers, which offers the advantage of an advanced variable selection procedure . The GRID/GOLPE analysis has been successfully applied to the development of 3D-QSAR models for the prediction of the activity of ligands of a number of GPCRs, including adrenergic, dopamine, serotonin and opioid receptors [40–44].
Another viable and convenient way for the derivation of descriptors for 3D-QSAR models was identified by Gasteiger and coworkers in the calculation of autocorrelation vectors of molecular electrostatic potentials (MEPs) mapped over the molecular surfaces of the ligands [45,46]. It is well accepted that MEPs considerably affect and govern molecular recognition. Furthermore, spatial autocorrelation vectors offer the advantage of being independent of the 3D orientation of the molecules. On these bases, Moro and coworkers recently introduced the autocorrelation of molecular electrostatic surface properties combined with partial least square analysis (autoMEP/PLS) as an attractive alternative to CoMFA [36,47–49]. The authors applied the novel strategy to the study of the activity of antagonists of the adenosine A3 receptor. Their results demonstrate that autoMEP/PLS has a predictive power comparable to that of CoMFA, with the advantage of not requiring 3D alignment of ligands. This property makes autoMEP/PLS a particularly attractive tool to generate 3D-QSAR models in cases in which the receptor-ligand interactions have not been clearly established, thus preventing the use of structure-based methodologies to generate the 3D alignment of the ligands.
Aside from 3D-QSAR, other approaches to QSAR have also been successfully applied to the prediction of the activity of GPCR ligands, as extensively reviewed by Tropsha and coworkers .
The most direct method for the quantitative prediction of ligand-protein affinity is the calculation of the free energy of binding from the 3D structure of the complexes while employing first principle techniques, such as free energy perturbations (FEP) or thermodynamic integration (TI). Especially when dealing with high-resolution protein crystal structures, these methodologies have proven the ability to reproduce with accuracy experimentally measured binding affinities [51–54]. In the field of GPCRs, a successful attempt to calculate the free energy of binding between the cholecystokinin-1 receptor (CCK1R) and a cholecystokinin peptide composed of nine amino acids (CCK9) was made by performing “alchemical” transformation of the peptide into a mutated one with known affinity for the receptor using FEP in a water-lipid environment . The initial protein-ligand complex for this study was generated by means of a number of optimizations of a rhodopsin-based homology model of CCK1R conducted in an iterative manner with site-directed mutagenesis experiments. The very encouraging reproduction of relative free binding energies supported the validity of the model and the experimentally derived assumptions on the basis of which it was constructed.
However, first principle calculations are time-consuming and require a great deal of effort for the preparation of the system and the optimization of the sampling. Therefore, they are not suitable for the study of many ligand-protein complexes. To overcome these limitations, in 1994 Aqvist and coworkers introduced the linear-interaction energy (LIE) method, also known as the linear response method (LRM), which offers a more simplified and rapid approach to the calculation of the free energy of binding [56,57]. By means of molecular dynamics or Monte Carlo simulations, average electrostatic and van der Waals energies are calculated for the ligand in the bound state and free in solution. In the earliest formulation of the LIE method, the free energy of binding is represented as a linear combination of the differences between the energies in the two states according to the formula:
where the scaling factor α is empirically determined, while β is set to 0.5. Subsequently, the methodology has been extensively developed and refined giving rise to a number of alternative variants. Notably, the most recent versions feature an empirically determined β factor and a third term representing the solvent accessible surface area (SASA), as proposed by Jorgensen and coworkers . Furthermore, the methodology has been adapted in order to function with Poisson-Boltzmann (PB) and Generalized Born (GB) continuum models of solvation with levels of accuracy comparable to those obtained with explicit solvent .
Regarding GPCRs, there have been a few attempts of calculating ligand binding energies through the LIE method. In particular, Gutiérrez-de-Terán and coworkers applied it with the aim of discriminating among two different binding modes of agonists of the adenosine A1 receptor generated by automatic docking . More recently, we applied the LIE method to the calculation of the activity of a series of antagonists of the P2Y1 receptor . The method did not show the ability to rank the activity of a test set of adenosine-3’-5’-bisphosphate analogs. However, it yielded relatively low root-mean-square errors of predictions and demonstrated substantial independence from the training set, supporting the applicability of structure-based methodologies to the discrimination between binders and non-binders. The combination of this specific model with ligand-based 3D-QSAR toward the prediction of the activity of P2Y1 antagonists is further discussed in the next section.
A more simplified and rapid quantitative description of ligand-protein interactions can be obtained through the application of scoring functions, which are intended for a rough high-throughput estimation of binding affinities. Scoring functions can be classified into three groups: force field-based, empirical, and knowledge-based [18,61–66]. The first are based on the evaluation of the affinity on the basis of molecular mechanics energy terms for non-bonded interactions, such as van der Waals and electrostatic energies. Knowledge-based scoring functions are based on the decomposition of ligand-protein binding into a list of interactions between atom pairs, which are evaluated with respect to a database of experimentally determined protein-ligand complexes [63–66]. The interaction between a particular atom type of a protein and a particular atom type of a ligand located at a specific distance is considered more or less favorable proportionally to the number of times it occurs in the database. Lastly, empirical scoring functions are based on a number of descriptors intended to reflect different contributions to binding affinity, such as steric and electrostatic interactions, hydrogen bonds, lipophilic and entropic effects, and solvation, all calibrated to fit experimental data on the basis of given training sets.
The empirical scoring function of the internal coordinate mechanics (ICM) program includes terms accounting for van der Waals interactions, ligand conformational entropy loss, hydrogen bond formation and disruption, solvation electrostatic energy change, hydrophobic free energy gain. In 2003, Cavasotto and coworkers demonstrated the applicability of this scoring function to the identification of the binding site of retinal in bovine rhodopsin and to the discrimination between binders and nonbinders of rhodopsin . Similarly, using molecular databases spiked with known binders, a number of studies based on combinations of different scoring functions (such as Gold , Dock , CScore (Tripos), Fresno , Score , FlexX , and PMF ) have demonstrated the applicability of molecular docking at GPCR models to virtual screening [9,10,73]. In agreement with the expectations set by these studies, a variety of docking programs and scoring functions have been successfully applied to the discovery of novel ligands for a plethora of GPCRs, including neurorokinin [74,75], adrenergic , chemokine [75,77], dopamine , serotonin [75,78], cannabinoid , and free fatty acid receptors , among others.
A crucial element in the construction of 3D-QSAR models is the structural alignment of the ligands, which may prove particularly challenging in the case of structurally diverse or highly flexible compounds. To overcome these difficulties, in 1993 Marshall and coworkers proposed the utilization of alignment techniques based on binding site geometries and minimizations within binding sites as effective alternatives to conformational searches and superimpositions solely based on the molecular properties of the ligands [81,82]. In the following years, with the development of sophisticated automatic docking algorithms and programs, this combined approach became increasingly more feasible and appealing. Much work needs to be done to improve the scoring functions towards a direct estimation of free energies of binding (see structure-based methodologies section). However, docking programs have demonstrated a substantial reliability in the prediction of ligand binding modes and bioactive conformations . Dealing with crystal structures of nuclear hormone receptors and enzymes, Sippl amply demonstrated the advantages of combining the accuracy of ligand alignments obtained through docking with the computational efficiency of 3D-QSAR methodologies [84–87]. Receptor-based 3D-QSAR – in which receptor-ligand complexes are not used to directly calculate binding interactions, but only to generate the 3D alignment of the ligands – has also been applied successfully to GPCRs, by means of docking studies conducted at homology models derived using rhodopsin as a structural template [7,37,44,48,88–91]. Notably, Moro and coworkers adopted this approach to obtain a quantitative model for the prediction of the activities of antagonists of the human A3 receptor. On the basis of this model, the authors designed, synthesized, and tested a set of novel compounds. The experimentally determined affinities of the novel antagonists were remarkably close to their predicted values, with an r2 of 0.873 [37,48].
Recently, we combined receptor-based QSAR techniques and purely structure-based techniques to generate a model for the prediction of the activity of antagonists of the P2Y1 receptor . We docked a set of antagonists to a rhodopsin-based model of the receptor and, as mentioned earlier in the paper, calculated docking scores and free binding energies via the LIE method, as implemented in the Schrödinger software package. In parallel, we used the alignment resulting from docking for the construction of receptor-based CoMFA and CoMSIA models. Additionally, in order to benefit from the strength of each technique and to compensate for their inherent limitations, we generated a consensus PLS model named LIST-CM, featuring as descriptors the activities independently predicted through receptor-based 3D-QSAR and through the direct calculation of receptor-ligand interactions. A schematic representation of this combined approach is provided in Figure 1. To test the predictive power of this model, we designed, synthesized, and tested a challenging test set of novel antagonists, each of which featured at least one substituent not present in the training set. With the training set, receptor-based 3D-QSAR methodologies demonstrated a predictivity far superior to that of purely receptor-based techniques. These results were expected, considering that homology models cannot provide a sufficiently high resolution picture of the binding site to rank with accuracy the activities of a series of analogues. Although not endowed with the ability to rank analogs, as mentioned, purely receptor-based methodologies showed relatively low root-mean-square errors of prediction together with a substantial independence from the training set, thus confirming their applicability to lead identification.
With a challenging test set, our consensus model that combines receptor-based 3D-QSAR and purely receptor-based techniques proved to be more robust than any individual model, and resulted in the accurate prediction of the activity of six out of the eight new compounds . These results suggest that the combination of a number of orthogonal methodologies might work synergistically toward obtaining improved, more robust models.
In silico methodologies for the quantitative prediction of the affinity of small molecules of pharmaceutical interest would greatly facilitate the process of drug discovery and development. However, due to the lack of direct 3D structural information, the application of these methodologies to the study of the activity of GPCR ligands is particularly challenging.
Ligand-based 3D-QSAR techniques have been applied for years in order to rationalize the structure-activity relationships of GPCR agonists and antagonists with the aim of producing models capable of quantitatively predicting the activity of new analogs. Being based exclusively on the properties of known ligands, these techniques offer the advantage of not requiring knowledge of the receptor structure. This aspect made possible their application to the discovery of GPCR ligands even before the publication of the experimentally elucidated rhodopsin structures that opened the way to GPCR homology modeling. Ligand-based 3D-QSAR techniques are capable of generating accurate and robust models leading to precise ranking of the activity of series of analog compounds. However, these models are highly dependent on the training sets. Thus, they are not applicable to situations in which activity data for a set of ligands of a specific target are not available. Furthermore, they often fail in the prediction of the activity of molecules with a structure significantly diverging from those included in the training set.
Conversely, structure-based methodologies rely on the direct estimation of the free energy of binding from the 3D structure of receptor-ligand complexes. Although essentially independent from a priori knowledge of the activity of ligands at the studied target, direct methods for the evaluation of the receptor-ligand affinity are either too computationally expensive to be considered for high-throughput drug discovery purposes or too approximate to yield meaningful affinity rankings. Furthermore, in most cases homology models do not provide the level of accuracy necessary to rank the affinity of a series of analogues to be exploited for lead-optimization purposes. However, structure-based methodologies often have a sufficient level of accuracy to allow an effective discrimination between binders and nonbinders. This, together with their substantial independence from training sets, allows their successful application in the quest for novel and structurally diverse lead compounds by means of virtual screening techniques.
Despite the inaccuracy and vagueness of scoring functions, docking programs have demonstrated the ability of producing sensible conformations of receptor-ligand complexes. In the case of GPCRs, this is especially true if the models incorporate experimental data usually obtained through site-directed mutagenesis and chemical modification of the ligands. On this basis, molecular docking appears to be a very good alternative to techniques based exclusively on the electrostatic and steric properties of the small molecules to yield the bioactive conformations and 3D superimpositions of the ligands necessary to generate 3D-QSAR models. The receptor-based 3D-QSAR models obtained in this fashion, besides being generally characterized by accuracy and robustness, also allow the docking of the generated QSAR contours into the model of the receptor, thus generating hybrid models intended to explain the structure-activity relationships in terms of receptor-ligand interactions. Additionally, although not endowed with ranking ability when considered alone, structure-based predictions might contribute to the construction of robust consensus models when combined with 3D-QSAR predictions . The contribution of structure-based methodologies to these combined approaches is expected to increase significantly in the future, with the advent of additional direct 3D structural information for GPCRs and the introduction of more sophisticated and accurate scoring functions.
The authors thank Dr. Santiago Vilar Varela for the helpful discussions. This research was supported by the Intramural Research Program of the NIH, NIDDK.