|Home | About | Journals | Submit | Contact Us | Français|
The cytochrome P450 (CYP) gene family strongly influences drug development. We determined potency values for 17,143 compounds against recombinant CYP 1A2, 2C9, 2C19, 2D6, and 3A4 enzymes through an in vitro bioluminescent assay. The compound collections included substances from typical libraries and FDA-approved drugs. Cross-library isozyme inhibition (30–78%) was observed with important differences between collections. While only 7% of the typical screening library was inactive against all five isozymes, 33% of FDA-approved drugs were inactive, reflecting the optimized pharmacological properties of the latter. Unexpectedly, drugs exhibited less activity towards the CYP 2C9 and 2C19 isozymes compared to un-optimized collections. We then identified substructures that differentiated between the five isozymes as well as substructures trending towards active or inactive categories. We describe here a pharmacological compendium to further the understanding of CYP isozymes.
The human cytochrome P450 (CYP) family comprises 57 isozymes in humans. These enzymes function in normal metabolism, influencing drug pharmacokinetics, and effect negative outcomes in patients through drug-drug interactions (DDIs).1, 2 The CYP isozymes metabolize approximately two-thirds of known drugs in humans, with 80% of this attributable to five isozymes - 1A2, 2C9, 2C19, 2D6, and 3A4.3 There has been increased effort to minimize CYP isozyme liabilities through incorporation of early stage in vitro metabolic characterization in drug discovery.4
Studies of compound interactions with CYP isozymes have been described5–8 but these works have addressed limited compound collections (< a few hundred members). Additionally, technologies, assay conditions, and data analysis methods are seldom conserved, hindering comprehensive comparisons.9 Despite the high interest in this gene family, few public databases exist (see for example www.bindingdb.org) and the scientific literature remains fragmented, making knowledge advancing data mining difficult.
To generate a public database useful for identifying metabolic liabilities within early leads, we tested CYP members under identical conditions against >17,000 small molecules using quantitative HTS (qHTS). In qHTS, libraries are assayed at multiple concentrations producing concentration-response curves (CRCs) and potencies for every compound10. We have previously shown the utility of qHTS in defining the activity within assays using either purified enzymes10–12 or complex cell-based activity profiles.13,14 Here, we apply qHTS to assay five of the major drug metabolizing enzymes, CYP 1A2, 2C9, 2C19, 2D6 and 3A4, with a bioluminescent-based detection technique that employs the metabolism of pro-luciferin substrates by firefly luciferase.15 The qHTS was performed against samples from the Molecular Libraries Small Molecule Repository (MLSMR - see http://www.ncbi.nlm.nih.gov/sites/entrez?db=pcsubstance&term=mlsmr) as well as against compounds with known or targeted biological activity including FDA-approved drugs. This analysis suggests that low CYP 2C isozyme activity is a common property of drugs while other isozymes such as CYP 2D6 showed little discrimination between the MLSMR and drugs. Using this dataset we identified isozyme selective substructures exhibiting tendencies toward inhibitory or inactive categories of activity. We expect that the CYP bioactivity database described here and available within PubChem to provide a foundation for testing and improving current CYP activity prediction models as well guiding the use of in vitro CYP assays in early phase drug discovery efforts.
Using a bioluminescent assay we tested 17,143 samples at between seven and fifteen concentrations for all five CYP isozymes. The samples consisted of 8,019 compounds from the MLSMR including compounds chosen for diversity and rule-of-five compliance,16 synthetic tractability, and availability; 6,144 compounds from a set of biofocused libraries which included 1,114 FDA-approved drugs; and 2,980 compounds from combinatorial libraries containing privileged structures targeted at GPCRs and kinases, and libraries of purified natural products or related structures. In qHTS, the Hill equation is fit to the data to generate CRCs for every compound tested in the manner described by Inglese et al.10 CRCs were divided into five categories based on the potency, efficacy and quality of the curve fit to the observed response: high (category 1) and low (category 2) confidence inhibitory CRCs, high (category 3) and low (category 4) confidence activator CRCs, and, if no response was observed up to the highest tested compound concentration (57 µM), inactive (category 5). All five isozyme assays showed good performance with Z'-factors averaging approximately 0.6. In re-testing a set of 91 randomly selected compounds we observed excellent confirmation of activity (between 84–90%) for all five isozymes (see Online Methods and Supplementary Fig. 1 online). We note that inhibition in the present dataset could be due compounds acting as inhibitors or substrates - both may decrease the bioluminescent signal through a reduction in the free enzyme concentration required to convert the pro-luciferin substrates (see Supplementary Fig. 2 online). The qHTS data for each of the five CYPs is shown in Figure 1a and is available in PubChem (assay identifier # 1851).
All five isozymes exhibited a high degree of activity for the 17K compound collection, with the predominating activity being of an inhibitory nature (30–78% of the collection). High confidence category 3 activating CRCs (compounds that increased the rate of pro-luciferin conversion) were only appreciably observed in CYP 3A4 (2.5%) and CYP 2C9/2C19 (approximately 3.5 % each). Activation of CYP isozyme activity is typically substrate dependent,17,18 so the present assays would not be expected to comprehensively characterize this type of behavior.
The distribution of IC50s within both category 1 and 2 actives is shown in Figure 1b. Selectivity was observed between the biofocused and MLSMR subset. For example, CYP 2D6 showed a similar frequency of activity between the two collections while the biofocused library was considerably less active against CYP 2C9 and 2C19 (Fig. 1b). A lower frequency of activity in the biofocused collection was also observed for isozymes 1A2 and 3A4. When the distribution of potency between the MLSMR and drug sets was compared we found that differences were not significant except for CYP 1A2 and 2D6 where the drug set appears to be less potent than the MLSMR set against CYP 1A2 and more potent than the MLSMR set against CYP 2D6 (p<0.01; Supplementary Fig. 3 online).
We observed differences between the biofocused, MLSMR and combinatorial chemistry compounds in both the number and distribution of inhibitory CRCs across the five CYP isozymes. In the MLSMR an average of 58±16% of compounds were found to be active against any specific isozyme. The biofocused library showed approximately half this activity (average of 32±8%). The FDA drugs showed an activity similar to the biofocused set (average of 31±8%). In contrast, the combinatorial library showed an average activity (52±19%) similar to the MLSMR subset, and CYP 1A2 and CYP 3A4 showed even stronger activity for a specific class of quinazoline compounds (60–75% of this compound class was active).
The large difference in activity between the FDA and MLSMR sets prompted a comparison of compound percentages demonstrating activity against various numbers of isozymes. As can be seen in Figure 2a, 33.3% of FDA-approved drugs were inactive against all five isozymes compared to 7.1% of compounds from the MLSMR subset. Pan-activity was increased approximately two-fold in the MLSMR subset compared to the FDA set (8.0% and 3.8% respectively). Also, there is a steady decline in combination CYP activity observed for FDA-approved drugs. The FDA set was less active, both in terms of the percentage of compounds interacting with any isozyme combination, and the average number of isozymes interacting with each compound.
On comparing the activities of the MLSMR and FDA sets we identified CYP 2C9 and CYP 2C19 as showing the largest differences: 46% and 57% fewer compounds were active against these two CYP isozymes, respectively, in the FDA set (Fig. 2b). Both CYP 3A4 and 1A2 discriminated to a lesser degree (approximately a 24% difference for each), while CYP 2D6 showed little discrimination between the two libraries (5% difference). We also found that when only one isozyme was active this was unlikely to be either CYP 2C19 or 2C9 for the MLSMR (Supplementary Fig. 4 online).
We clustered all 17K compounds based on their structural similarity and represented these as self-organizing maps (SOMs; Fig. 3).19,20 In the SOMs each hexagon represents a cluster of structurally similar compounds, with neighboring hexagons containing more similar structures than distal hexagons. Highly active scaffolds for CYP 1A2 present in the combinatorial library can be seen as blue hexagons (deficient in active compounds) in the bottom right part of the SOMs for four isozymes, but red (enriched in active compounds) for CYP 1A2. This CYP 1A2 cluster is the quinazoline class of compounds mentioned above. The two hexagons in the bottom left corner which are colored red in the SOMs for CYP 2C19, 2C9 and 3A4, but colored blue for CYP 1A2 and 2D6 show compounds that are selectively active against the former three isozymes and inactive against the latter two. The number of compounds active against all isozymes is relatively small (n=350). This is apparent in Figure 3, where few hexagons are colored red in all five SOMs. The fact that these compounds are clustered together indicates they share a relatively high degree of structural similarity. Conversely, the number of pan-inactive compounds is quite large (n >2,000, shown in the SOMs as blue cluster regions across all five SOMs (Fig. 3).
Once the activity data was organized by SOMs we could relate the activity patterns to the genetic similarity of the human CYP isozymes. We hierarchically clustered (using a Minkowski distance as the similarity metric) the five isozymes using the compound activity patterns, and the resulting dendrogram is shown in Figure 3. Clustering divided the five isozymes into two major groups, one consisted of CYP 2C19, 2C9 and 3A4, with CYP 2C9 and 2C19 having the most similar activity patterns, and the other one consisted of CYP 1A2 and 2D6, although these latter two showed a lesser degree of activity similarity.
To identify structural features that either infer activity, or ensure a lack of the same, we searched for substructures disproportionately represented in particular CRC classes relative to the entire testing set. A selection of the results demonstrating significant population shifts is given in Figure 4 (contact the corresponding author for complete list). To clarify associations we focus on the category 1 CRC classes. Activating CRCs categories are not shown because of the relative dearth of records in this class. It can be seen from Figure 4 that the presence of an aliphatic alcohol group (1) is associated with a significant shift towards the inactive class for four of the five isozymes. The presence of an aromatic hydroxyl group is associated with a similar but weaker trend, as is the presence of an ether linkage (data not shown). The presence of a primary aliphatic amine (2) or a quaternary ammonium salt (5) is also associated with a pan-isoform shift towards the inactive class. In contrast, secondary and tertiary aliphatic amines (3 and 4, respectively) are associated with isoform-specific behavior, shifting towards the inhibitor/substrate response class for CYP 2D6 but shifts to the inactive class for other isozymes. This is consistent with the known preference of CYP 2D6 for substrates containing basic, protonatable nitrogen atoms.21 The presence of a carboxylic acid moiety (6) is also correlated with a strong, shift towards the pan-inactive class. This can be compared with the trends for simple esters amides, and carbamates, which are generally much weaker and less consistent (data not shown). Imide (7) and urea (8) functionalities show similar patterns to carboxylic acids, although with weaker shifts. In the case of imides, the combinatorial library that contained a high density of this functionality (see Fig. 5) may explain the weaker shift. Oxime o-ethers (9), sulfonates (10) and phosphorus groups (11) are additional moieties associated with inactive class shifts, although the frequency of occurrence of the latter in the testing set was low.
In terms of simple rings, the presence of oxolanes (12) shows a correlation with a shift towards the pan-inactive class. Aromatic equivalents such as thiophenes (13), furans and pyrroles (data not shown for the latter two) are associated with a shift in the opposite direction. Other aromatic groups such as pyrimidines, indoles, benzodioxoles and naphthalenes (14 to 17) are also generally associated with shifts towards the inhibitor/substrate class, particularly for CYP 1A2 (known to have preference for planar, polyaromatic substrates).6
To identify more complicated substructures, we performed a similar analysis to that previously outlined by Inglese et al.10 For each isoform, the set of molecules assigned to category 1, 3 or 5 CRCs was clustered using extended-connectivity fingerprints (Pipeline Pilot 6.1, Scitegic, 2006, http://accelrys.com/products/scitegic). For each cluster the maximal common substructure (MCS) was identified, which was then used to query the whole test set. This process was repeated several times with slightly different parameters for each isoform, in an attempt to ensure that a representative sample was taken. The most significant results from this analysis are summarized in Figure 5. Analysis identified further substructures associated with pan-inactive shifts, including long, aliphatic carbon paths (18) while substructures incorporating the simple chemical functionality such as (19, 22 23) showed isoform specific behavior. Purine scaffold (20) and steroidal (24) compounds appear to be largely inactive, consistent with the role of CYPs as largely metabolic rather than biosynthetic enzymes.8 In contrast, a monosaccharide substructure (23) is associated with isoform-specific behavior, shifting strongly towards inactive category for CYP 1A2 but an opposite shift for CYP 3A4. Of particular note is the quinazoline structure (19) where 89% of the compounds containing this moiety were assigned to category 1 CRC for CYP 1A2, compared with only 8% for CYP 2C9. The known preferences of the CYP 1A and CYP 2C family of isozymes for planar, polyaromatic and non-planar substrates, respectively6 explain this observation. Scaffolds 21 and 22 are contained in two sub-libraries that were included in the testing set and shown in more detail in Supplementary Figure 5 online. This illustrates a weakness of the analysis method, namely that without visual inspection we cannot judge whether an identified substructure truly constitutes a significant common element. The automated MCS procedure (and indeed the chemical group analysis) highlighted parts of the scaffolds shown in Supplementary Figure 5 but without further investigation we would not have identified the corresponding sub-libraries.
The qHTS method allowed definition of a pharmacological profile of CYP activity with respect to libraries that included drugs, un-optimized commercially available compounds, and combinatorial collections. A key advantage of this database derives from a single series of experiments using a bioluminescent assay format in a manner where potency was determined for every compound and CRCs could be categorized to define activity, facilitating direct comparisons of results between isozymes. The database should aid in constructing and testing new predictive models of CYP activity.
We recognize that comparison of trends between isozymes provided here must be treated with care, as different (although similar) probe substrates were used for the various isozymes, and this has previously been shown to influence observed effects on CYP activity.22–24 Overall, the bioluminescent assays demonstrated a correlation similar to CYP fluorescent assays when compared to conventional methods (e.g. analytical detection of products; see Online Methods). Excellent correlations were observed for CYP 1A2, 2C9, and 2D6 while CYP 2C19 and 3A4 also performed well but were less well correlated (Supplementary Fig. 6 online). Inhibitory activity in the assay may be due either to compounds acting as substrates or inhibitors, and some weak-binding substrates may be classified as “inactive” (highest testing concentration = 57 µM). As no pre-incubation of compound with CYP, was included, this database will be less sensitive towards time dependent inhibitors and will miss mechanism-based inhibitors. To assess if the potencies observed are clinically significant, we compared the IC50’s to the Cmax value for approximately 140 drugs showing inhibition at one or more of the CYPs. From this analysis, and based on FDA guidelines, we estimate that DDIs are probable for approximately 20% of the study drugs showing inhibition, although the FDA criteria (DDIs probable with [I]/KI >0.1) is stricter than what is typically applied in early optimization efforts (Supplementary Fig. S7 online).
The CYP gene family has evolved to cover a wide range of chemical structures and we observed activity (30–78%) for each of the five isozymes in this study. We found that at least one isozyme was active in 93% of the MLSMR and 72% of the biofocused compounds. However, we found differences in the amount of activity between MLSMR and drug collections. For example, pan-inactive compounds were nearly five times more prevalent in the drug set over the MLSMR. CYP 2D6 and CYP 1A2 showed a different selectivity than the other three CYP isozymes, although this selectivity does not discriminate well between the MLSMR subset of compounds and the drugs. Two isozymes, CYP 2C9 and 2C19, showed selectivity between drugs and the MLSMR. It has been suggested that CYP 3A4 is the most prominent P450 isozyme in drug metabolism and hepatic distribution (Fig. 2b),25, 26 but the drugs in our collection do not appear to have been optimized away from this activity. There has also been speculation that CYP 2D6 isozyme plays a prominent role in drug metabolism,27 but no difference in activity was observed between diversity compounds and approved drugs for this isozyme. Our data shows drugs to be more potent against CYP 2D6 than against the unoptimized compounds from the MLSMR, indicating that CYP 2D6 activity has not been a historical consideration in drug optimization efforts. Therefore, while activity against any/all CYP isozymes should be considered during lead optimization, the analysis provided here suggests that historically drugs have been particularly optimized against CYP 2C9 and 2C19 isozymes. Taken together the CYP 2C family shows similar involvement in drug metabolism as CYP 3A428 and CYP 2C9 shows a hepatic expression level similar to CYP 3A4 (Fig. 2b).
Comparison of bulk compound properties between actives and inactives showed a slight differential for ALogP and LogSw (Supplementary Fig. 8, online), consistent with some trends shown in Figure 4 and Figure 5 such as the prevalence of aliphatic alcohol or charged groups among pan-inactives compounds. However, analysis of compound fragments also showed isozyme selective substructures. Additionally, biochemicals such as steroids and purines were among the less active fragments consistent with these five isozymes being primarily involved with xenobiotic metabolism. When examining common ring systems we observed more divergent activity with oxolanes showing a preference for pan inactivity whereas similar rings such as thiophenes or furans shifted toward pan activity.
Many computational strategies have been advanced towards predictive CYP isozyme activity models.29–33 Several factors have been implicated in the limits of their success foremost amongst these is the lack of a single large, diverse dataset of CYP isozyme activities.34 It will be of great interest to see if the dataset described here, available in PubChem, can fuel the development of more robust CYP activity models.
This research was supported by the Molecular Libraries Initiative of the NIH Roadmap for Medical Research and the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. Work in Trinity College Dublin was supported by Enterprise Ireland, the Chemical Computing Group, OpenEye Scientific and Accelrys. We thank Sean Jefferies and Giorgio Carta for helpful discussions, Sam Michael and Carleen Klumpp for help with robotic automation of the assays, and Paul Shinn for preparation of compound dilutions and library plates.
Author contributions: H.V. collected experimental data; H.V., N.S., R.H., T.J., D.F., N.A., M.S., D.G.L., and D.S.A. performed analysis; H.V., N.S., T.J., D.F., R.H., D.G.L., J.I., C.P.A., and D.S.A wrote the paper.