The overall screening process is outlined in . We decided to process 2D substructure searches and 3D structure-based virtual screening as two parallel threads. The rationale for this choice was to combine hits from an ad hoc approach based on a medicinal chemist’s judgment with those from a computational approach taking direct advantage of the available protein structure. More specifically, the ad hoc approach may provide ligands whose binding mode and affinity cannot be adequately assessed by virtual screening techniques. Alternatively, a structure-based approach yields hits along with a sound hypothesis about their binding mode thus allowing immediate guidance to structural modifications which may improve potency.
Figure 3 Overall screening process chart. Numbers in boxes are counts of compounds in the specified category. “2D selected” and “3D selected” are respective outcomes of automated and ad hoc selections as described in the Materials (more ...)
MBT domains represent a unique class of methyl-lysine binders. For instance, unlike most other domains belonging to the Royal family and recognizing Kme3, MBT bind lower methylation states (i.e., Kme1 and Kme2). Moreover, MBTs recognize their respective histone methylation sites employing a “cavity-insertion” mode, which buries the Kme side chain within a deep cleft, as opposed to a sequence-dependent “surface-groove” mode, involving a wider methyl lysine-binding pocket8
. MBT domains have a highly conserved architecture, an “aromatic cage, including Phe379, Trp382 and Tyr386 (numbering for L3MBTL1), as shown in .
Methyl lysine binding site of hL3MBTL1 (light green) in complex with an H4 histone 10-mer peptide (dark green) [PDB code: 2RJF]. The surface (light gray) outlines the methyl-lysine-insertion cavity.
These aromatic residues are involved in cation-π interactions with the methylated ammonium group, while a highly conserved Asp355, forms an ionic bond and is critical for the lower methyl mark recognition. For instance, in 3 human MBT domains known to bind Kme (D2-hL3MBTL1, D2-hSCML2 and D4-L3MBTL2)25
, Phe379, Trp382 and Asp355 are conserved in all of them, while Tyr386 is conserved in 2 domains (mutated to Phe in D2-hSCML2). Hence, given the high degree of the pocket conservation, we have chosen hL3MBTL1 as a representative member of the MBT family for the current virtual screening study, expecting that some of identified virtual hits will also be active on other family members.
The critical importance of the Kme cavity insertion combined with the relatively low impact of peptide sequence26
prompted us to start our hit fishing with a minimalist hypothesis that a set of close methyl lysine side-chain mimics might be a good starting point for an experimental screening study with some hope that the non-Kme mimic portion of the molecules selected would serendipitously provide additional binding interactions. Searching CHEMNAV_5.9M using C3
NMe and C3
(see ) as substructure queries resulted in 1,199 hits. We then applied a redundancy reduction procedure, which consisted of clustering of the hits obtained into very compact (in terms of internal similarity) clusters and selecting one central compound per cluster. The resulting set of 344 cluster centers were grouped into 288 families featuring common Murcko frameworks27
. The families were then subjected to an ad hoc
selection, based on consideration of a combination of physical and structural properties which determine their lead-like potential. This analysis yielded 35 compounds and some of them were supplemented by close structural analogs that resulted in a final list of 50 compounds. Some of those compounds were further excluded from the list based on price and, upon purchase, on QC analysis, that resulted in an experimentally tested set of 36 methyl lysine analogs.
In addition to the substructure search with restrictive queries, we also intended to take a more direct advantage of the crystal structure. However, we estimated that application of a docking method to 5 million compounds would not be an adequate solution. Indeed, in the absence of a diverse set of known binders, it would not be possible to validate the docking/scoring protocol, leading to a higher rate of false positives, particularly because the relatively shallow binding cavity will only be partially occupied by the majority of ligands, which would still be highly scored because of their propensity to readily form geometrically accurate hydrogen bonds with solventexposed residues28, 29
Alternatively, a pharmacophore approach enables the identification of ligands possessing functional features characteristic of an active compound, implying that they bind the target similarly to the prototypic active. Therefore, a pharmacophore may serve as an efficient filter to select ligands that are likely to bind in a similar fashion to the histone peptide in x-ray structures. Docking/scoring pharmacophore hits in the protein binding site will then play a complementary role for an accurate assessment of steric and van der Waals interactions.
The pharmacophore model was built using the crystal structure of L3MBTL1 in complex with a co-crystallized histone peptide [2RJF] (as described in Materials and Methods). Pharmacophore screening of CHEMNAV_5.9M resulted in 20,078 hits, which represents an affordable workload for the downstream docking/scoring without any additional filtering. Docking of pharmacophore hits was performed using Glide at Standard Precision as described in Materials and Methods. A total of 60,126 poses (20,055 ligands) had a G-score <0 kJ/mole. To be consistent with the rationale of a sequential pharmacophore-docking protocol, we retained only those 16,830 poses (8,947 ligands), which interact with Asp355 and Asn358 (interactions that our pharmacophore model is based upon). In order to choose a statistically significant G-score cutoff, we made use of the probability density distribution of G-score values obtained by docking a set of 10,000 decoys. These decoys were randomly selected from 334,992 commercially available compounds, having physical profiles similar to those of pharmacophore hits (i.e., one positive ionizable group, ≥2 HBA and ≥1 HBD). Our assumption was that a random selection from a broad compound set would have a distribution of G-scores characteristic of that of inactive compounds and would be indicative of the false positive rate at a given G-score value. Based on the clearly asymmetric nature of these distributions, we did not assume any analytical form and made use of a non-parametric, kernel density estimator (with Gaussian kernels). The distribution (see ) shows that inactive compounds are quite unlikely to have a G-score > 5.5 kcal/mole when interacting with the binding site of L3MBTL1 and therefore this value may be set as a threshold to select docking hits.
Figure 5 The probability density distribution extrapolated from G-score values resulting from docking of 10,000 random compounds having physical properties similar to those of pharmacophore hits. Although this set may accidentally contain some actives, the distribution (more ...)
The 168 primary hits (with G-score > 5.5 kcal/mole) were clustered into families of structurally related compounds. Poses of the best scored representatives of each of 36 clusters were reviewed within the protein binding site. Only poses having at least 2 hydrogen bonds, in addition to the required ionic bond, with the protein were retained. Finally, 17 ligands representative of 4 clusters were selected as candidates for purchase.
In the end, a total of 51 compounds (36 resulting from 2D search and 15 from virtual screening) were actually screened against our panel of 4 MBT-containing proteins, i.e., L3MBTL1, L3MBTL3, L3MBTL4, MBTD1 (see supporting material
for a complete SD file of experimentally tested samples). To make sure that even weakly active compounds are identified, all 51 compounds were submitted to dose-response AlphaScreen™
experiments in a concentration range of 5 nM to 100 μM. Nineteen of 51 tested compounds () demonstrated an unambiguous dose-dependent effect in this assay (). Fourteen of these experimental hits are part of the 36 compounds identified by the substructure search, while 5 come from the set of 15 pharmacophore/docking hits. shows the dose-response curves and structures of the most potent compounds from each hit category as well as the highest scored pose of the most potent docking hit having a pyrrolidine moiety interacting with Asp355 (in place of mono- or di-methylated ammonium). The two hit categories are complementary in terms of their potential for future chemical optimization. For example, some of the most potent 2D hits (1
) selectively bind to a single MBT-containing protein from our panel. However, the binding mode of these hits cannot be reliably hypothesized and many of them cannot be mapped to our pharmacophore model. Conversely, the pharmacophore/docking hits may be readily mapped to the pharmacophore and thus their binding mode to most of MBT domains may be hypothesized with high confidence. It still remains unexplained why the structure-based hits are selective to one or two of four MBT-containing proteins on our screening panel despite that they all possess a pharmacophore, which should confer an ability to bind any MBT domain. This selectivity is reassuring in the light of the future chemical optimization and its structural rationale will certainly be understood when more ample structure-activity data are available. The structure-based hits also provide evidence that Kme1 or Kme2 moieties are not the only functional groups capable of binding the MBT aromatic cage. For example, the pyrolidine-containing compound 13
is one of the most potent (IC50
=17 μM) among the screened compounds. Also, compound 15
, which shows some activity against L3MBTL3 (IC50
=54 μM), has a rigid alkyne linker instead of a lysine-like alkane chain.
Structures of 19 virtual hits confirmed by AlphaScreen.
Table 1 IC50 results for the Alpha-screen hits against four MBT-containing proteins. The Source column indicates whether a hit was identified by substsructural searches (2D) or by the structure-based approach (SB). In case of multiple dose-response experiments, (more ...)
Figure 7 (a, b) Alpha-screen dose-response curves for two most potent hits coming from substructure searches (compound 2) and the pharmacophore/docking protocol (compound 13). (c) The highest scored docking pose (magenta sticks) of compound 13 is superposed with (more ...)
It is noteworthy that one of the substructure-search hits is Maprotiline (3), an approved drug and strong norepinephrine uptake inhibitor, also active against a broad set of aminergic G-protein coupled receptors (GPCR). Consequently, Maprotiline, in addition to its known biological properties, may also have some chromatin-related activity, although the affinity to L3MBTL1 is 3 orders of magnitude lower than the affinity to its primary target and may be of little pharmacological relevance.
The overall SAR for identified hits from both categories is quite flat (5.7 to 96 μM) and may be explained by the current binding mode hypothesis which implies that a large portion of each hit molecule is exposed to solvent. Additionally, the potency of currently identified hits is certainly insufficient to consider them as probe30
candidates and will be the subject to further chemical optimization. The upcoming optimization will target a more substantial “burying” of a ligand in the MBT binding pocket. Possible directions would include modifications of the linker between the deeply buried amino group and the outer aromatic motif as well as ortho substitutions on the outer aromatic group (e.g., ortho substituted compound 13