|Home | About | Journals | Submit | Contact Us | Français|
The Malignant Brain Tumor (MBT) repeat is an important epigenetic-code “reader” and is functionally associated with differentiation, gene silencing and tumor suppression1–3. Small molecule probes of MBT domains should enable a systematic study of MBT-containing proteins, and potentially reveal novel druggable targets. We designed and applied a virtual screening strategy, which identified potential MBT antagonists in a large database of commercially available compounds. A small set of virtual hits was purchased and submitted to experimental testing. Nineteen of the purchased compounds showed a specific dose-dependent protein binding and will provide critical structure-activity information for subsequent lead generation and optimization.
Epigenetics refers to the study of heritable changes in gene function that occur without a change in the DNA sequence4. Epigenetic mechanisms of gene activation and inactivation permits specialization of function between cells even though each cell contains essentially the same genetic code5. Typically, changes in the environment trigger post-translational modifications of histone proteins and DNA (“epigenetic marks”) including histone lysine and arginine methylation, lysine acetylation, DNA cytosine methylation, and histone sumoylation, ubiquitination, ADP-ribosylation and phosphorylation6. These epigenetic modifications form spatial arrangements (often referred to as the “epigenetic code”), which recruit protein complexes (epigenetic-code “readers”) that cause chromatin to wind and unwind in order to control access of transcription factors to genes. Specific molecular mechanisms by which the “reader” proteins alter gene activation are a subject of intense investigation7, 8 and small-molecule switches selectively disrupting critical protein-protein interactions would significantly contribute to the ongoing research.
Malignant Brain Tumor (MBT) domains represent an important class of “code readers” whose function is probably the least understood of this group. From a physiological perspective, MBT proteins are associated with chromatin condensation and act to repress the transcription of genes, ultimately affecting processes such as differentiation, mitotic progression and tumor suppression1–3, 9.
Structurally, MBT repeats are similar to the “Royal Family” histone-binding proteins10 and recognize prevalently mono- and di-methylated lysines (Kme1 and Kme2)11. To date, 9 human proteins containing a total of 27 different MBT domains were identified, demonstrating the complex precision with which this specific family of histone binding proteins regulates chromatin accessibility. Therefore, the development of potent and selective small molecule probes for each of the human MBT proteins would facilitate a greater understanding of their roles in stem cell differentiation, cellular reprogramming and disease etiology. A substantial body of structural information, which is currently available on many MBT domains12–16, opens an avenue for rational approaches to the probe-generation effort for this fascinating target class.
Here, we employed a virtual screening strategy to discover non-peptide, cell-penetrant probes for MBT-containing proteins. Indeed, database searching and ligand- or structure-based virtual screening have proved to be useful tools and become an integral part of the drug discovery process in recent years. The virtual screening process mimics its experimental counterpart and is used to rank or filter large ligand databases in order to yield a compound set “enriched” in hits when experimentally screened. One of the most remarkable virtues of computer-aided approaches is their capacity to screen (i) targets with no assays amenable to an HTS format and (ii) compound collections not readily available for in-house experimental screens. In the search for MBT probes, we screened one the most comprehensive databases of commercially available compounds, the iResearch Library (ChemNavigator)17, which by the end of 2008 contained more than 50 million procurable chemical samples. To this end, we employed two complementary approaches, one of which consisted of searching for compounds containing Kme1 and Kme2 side chains, while the other approach involved sequential application of pharmacophore and docking techniques, hence potentially resulting in more structurally remote compounds mimicking the peptide interaction mode.
A basic prerequisite for an efficient hit discovery process is an accurate, fast and cost-effective experimental screening technique capable of timely assessment of procured virtual hits. We have previously introduced a novel HTS assay making use of the AlphaScreen™ technology and this technique was employed as a primary experimental confirmation for the selected virtual hits.
The 2008.2 release of iResearch Library was obtained from ChemNavigator in SD format. Only a 5,967,880 subset of “sourceable” compounds was considered for screening. The structures of these compounds were further cleaned and filtered using the PipelinePilot software18. The cleaning protocol included salt stripping, mixture splitting, functional group standardization and charge neutralization. Ionizable compounds were then converted to their most probable charged forms at pH 7.4 using the LigPrep software19. The filtering process included a softened version of the Lipinski rule20 (2+ violations of Num H-donors<6, Num H-acceptors<12, MolWeight between 200 and 600, ALogP<5.5). The filtered set of 5,888,263 compounds (“CHEMNAV_5.9M”) was then used for 2D searches and analyses as well as a starting point for the 3D dataset generation. PipelinePilot was used for 3D conversion. Stereoisomers were systematically enumerated for chiral compounds with undefined chirality and having less than 3 chiral centers. For chiral compounds with undefined chirality and having more than 2 chiral centers a single stereoisomer was produced at random. Compounds with more than 12 rotatable bonds were removed from the 3D set because they represent a substantial burden for both pharmacophore mapping and docking algorithms.
Substructure searches were performed by means of the Pipeline Pilot software on the CHEMNAV_5.9M database. Figure 1 shows the query substructures used in the search for structural analogs of the Kme1 and Kme2 side chains.
The pharmacophore was prepared using the Discovery Studio 2.5 software18. We made use of the high-resolution (2.05 Å) crystal structure of L3MBTL1 in complex with H4K20me2 (PDB code: 2RJF)13. The Kme2 and adjacent residues were used to define pharmacophoric features encoding 3 electrostatic-interaction sites (as shown in Figure 2): (i) hydrogen-bond donor (HBD) matching the H4-Lys20 backbone nitrogen interacting with Asn358, (ii) hydrogen-bond acceptor (HBA) of the H4-His18 backbone carboxyl interacting with Asn358, and (iii) amine cation involved in an ionic bond with Asp355. Furthermore, the non-hydrogen atoms of the aromatic residues of L3MBTL1 forming the aromatic cage around the histone-lysine side chain were used to define 16 exclusion spheres. The precision spheres of the pharmacophoric features (i.e., regions of space to which a virtual hit should fit to) were set to 1 Å.
The pharmacophoric screening of the small-molecule set of procurable compounds was performed by means of the Catalyst module of the Discovery Studio 2.5 suite18. To this end the 3D SD ligand file was converted to a multi-conformer Catalyst database. The conformers were sampled using the BEST method allowing up to 100 conformers per molecule. The enumerated conformers from the Catalyst database were then rigidly fitted against the pharmacophore.
A high-resolution (2.05 Å) crystal structure of L3MBTL1 in complex with H4K20me2 (PDB code: 2RJF)13 was selected and used at the docking stage. The corresponding PDB file was processed as follows. Hydrogen atoms were added to the protein, the active site was visually inspected and the appropriate corrections were made for tautomeric states of histidine residues, orientations of hydroxyl groups, and protonation states of basic and acidic residues. The hydrogen atoms were energy minimized in the MMFF force field21 using the Macromodel software with the Maestro graphics interface19 with all the non-hydrogen atoms constrained to their original positions.
Small-molecule structures were docked into the active site of the target protein using the Glide program19, 22 in standard docking precision (Glide SP). The binding region was defined by a 10 Å × 10 Å × 10 Å box centered on the Kme2 side chain of the co-crystallized histone peptide. A scaling factor of 0.8 was applied to the van der Waals radii. Default settings were used for all the remaining parameters. The top 10 poses were generated for each ligand. The docking poses were then energy minimized with Macromodel in the OPLS2001 force field23, with flexible ligand and rigid receptor. The refined poses were re-ranked based on the calculated interaction energy. The lowest-energy pose for each ligand was selected and rescored in the active site using GlideScore, and the compounds were ranked accordingly.
The Pipeline Pilot software18 was used for the whole process of hit analysis and selection at all screening steps. Diversity-based selections were generally performed in two steps. First, an automated redundancy reduction is performed by selecting a single representative of a small similarity-based cluster. Compounds in such a cluster should be similar at ≥65% (Tanimoto with ECFP6 fingerprints). In the second step, compounds were clustered into broader families by means of the Maximum Common Substructure (MCS) method. Twenty to 50% of compounds were then selected from each cluster in such a way that larger clusters contributed smaller percentages. The output ligands were aligned to their respective MCS to facilitate an ad hoc selection.
The quality control of the plate containing the screened compounds was performed by diluting a 1 μL DMSO stock solution (100 μM concentration) with 29 μL of MeOH. The sealed plate was directly used to inject 5 μL for each well. HPLC data of all compounds were acquired using an Agilent 6110 Series system with the UV detector set to 220 nm. Samples were injected onto an Agilent Eclipse Plus 4.6 × 50 mm, 1.8 μM, C18 column at room temperature. A mobile phase of A being H2O + 0.1% acetic acid and B being MeOH + 0.1% acetic acid was used. A linear gradient from 10% to 100% B in 5.0 min was followed by pumping 100% B for another 2 minutes with a flow rate of 1.0 mL/min. Mass spectra (MS) data were acquired in positive ion mode using an Agilent 6110 single quadrupole mass spectrometer with an electrospray ionization (ESI) source.
The purity of the compounds screened is found to be 95% or higher.
The constructs containing MBT repeats for L3MBTL1 (residues 200 to 530), L3MBTL3 (residues 225 to 555), L3MBTL4 (residues 44 to 371) and MBTD1 (residues 130 to 566) cloned into a pET28a-MHL plasmid and transfected into BL21 DE3 E. coli were generously provided by the Structural Genomics Consortium and purified as previously described24. The following additional peptides were synthesized and high-performance liquid chromatography (HPLC) purified by the Tufts Peptide Synthesis Core Facility (Boston, MA) to act as substrates for L3MBTL3, L3MBTL4 and MBTD1. A peptide representative of monomethyl lysine 36 on histone H2A (H2AK36me) with the sequence Biotin-AHA-GRVHRLLRK(Me)GNYSER-COOH was used as a substrate for L3MBTL3 and L3MBTL4 and a peptide representative of H4K20Me with the sequence Biotin-AHA-KGGAKRHRK(me)VLRDNIQ-COOH was used as a substrate for MBTD1. Here and further in the text, (me) denotes the site of the monomethylated lysine, AHA indicates the inclusion of a 6- aminohexyl linker between the N-terminal residue and the biotin group and COOH indicates a free carboxylic acid on the C-terminus.
Compounds for the dose response runs were resuspended to a concentration of 100 mM in DMSO in barcoded glass vials and sonicated using a Covaris XX (Covaris, Woburn, MA). The compounds were plated as 3-fold dilutions over 10 points using a Tecan Genesis (Tecan, Männedorf, Switzerland) in 384-well V-bottom polypropylene microplates (Greiner, Monroe, NC). A Multimek NS X-1536 fitted with a 384-channel head (Nanoscreen, Charleston, SC) was used to spot 1 μL of the compounds into 384-well polypropylene V-bottom microplates that were sealed and stored at − 20 °C. On the day of use, the compounds were prepared for screening by diluting 100-fold in 1X assay buffer and 1 μL of the diluted titrations were spotted into 384-well Proxiplates to which 9 μL of protein and peptide cocktail was added to initiate the assay. The AlphaScreen™ assay was performed as previously described for L3MBTL124 with the following modifications for screening the other MBT proteins. L3MBTL3 was assayed at a concentration of 200 nM with 150 nM H2AK36me1, L3MBTL4 was assayed at a concentration of 100 nM with 150 nM H2AK36Me and MBTD1 was assayed at a concentration of 100 nM with 150 nM H4Kme20. The binding of L3MBTL1, L3MBTL3 and MBTD1 to their cognate peptides was detected using 5 μg/mL AlphaScreen™ Nickel Chelate acceptor and strepatvidin donor beads, and the interaction between L3MBTL4 and its cognate peptide was detected using 10 μg/mL of the same beads. Dose-response runs were analyzed using ScreenAble software (Screening Solutions LLC, Chapel Hill, NC), and IC50 values were calculated using 4-parameter fits, or using 3-parameter fixed top fits as necessary. The counterscreen was performed to identify any compound interference of AlphaScreen™ signal transduction as previously described24 after the compounds were prepared as described above.
The overall screening process is outlined in Figure 3. We decided to process 2D substructure searches and 3D structure-based virtual screening as two parallel threads. The rationale for this choice was to combine hits from an ad hoc approach based on a medicinal chemist’s judgment with those from a computational approach taking direct advantage of the available protein structure. More specifically, the ad hoc approach may provide ligands whose binding mode and affinity cannot be adequately assessed by virtual screening techniques. Alternatively, a structure-based approach yields hits along with a sound hypothesis about their binding mode thus allowing immediate guidance to structural modifications which may improve potency.
MBT domains represent a unique class of methyl-lysine binders. For instance, unlike most other domains belonging to the Royal family and recognizing Kme3, MBT bind lower methylation states (i.e., Kme1 and Kme2). Moreover, MBTs recognize their respective histone methylation sites employing a “cavity-insertion” mode, which buries the Kme side chain within a deep cleft, as opposed to a sequence-dependent “surface-groove” mode, involving a wider methyl lysine-binding pocket8. MBT domains have a highly conserved architecture, an “aromatic cage, including Phe379, Trp382 and Tyr386 (numbering for L3MBTL1), as shown in Figure 4.
These aromatic residues are involved in cation-π interactions with the methylated ammonium group, while a highly conserved Asp355, forms an ionic bond and is critical for the lower methyl mark recognition. For instance, in 3 human MBT domains known to bind Kme (D2-hL3MBTL1, D2-hSCML2 and D4-L3MBTL2)25, Phe379, Trp382 and Asp355 are conserved in all of them, while Tyr386 is conserved in 2 domains (mutated to Phe in D2-hSCML2). Hence, given the high degree of the pocket conservation, we have chosen hL3MBTL1 as a representative member of the MBT family for the current virtual screening study, expecting that some of identified virtual hits will also be active on other family members.
The critical importance of the Kme cavity insertion combined with the relatively low impact of peptide sequence26 prompted us to start our hit fishing with a minimalist hypothesis that a set of close methyl lysine side-chain mimics might be a good starting point for an experimental screening study with some hope that the non-Kme mimic portion of the molecules selected would serendipitously provide additional binding interactions. Searching CHEMNAV_5.9M using C3CH2NMe and C3CH2NMe2 (see Figure 1) as substructure queries resulted in 1,199 hits. We then applied a redundancy reduction procedure, which consisted of clustering of the hits obtained into very compact (in terms of internal similarity) clusters and selecting one central compound per cluster. The resulting set of 344 cluster centers were grouped into 288 families featuring common Murcko frameworks27. The families were then subjected to an ad hoc selection, based on consideration of a combination of physical and structural properties which determine their lead-like potential. This analysis yielded 35 compounds and some of them were supplemented by close structural analogs that resulted in a final list of 50 compounds. Some of those compounds were further excluded from the list based on price and, upon purchase, on QC analysis, that resulted in an experimentally tested set of 36 methyl lysine analogs.
In addition to the substructure search with restrictive queries, we also intended to take a more direct advantage of the crystal structure. However, we estimated that application of a docking method to 5 million compounds would not be an adequate solution. Indeed, in the absence of a diverse set of known binders, it would not be possible to validate the docking/scoring protocol, leading to a higher rate of false positives, particularly because the relatively shallow binding cavity will only be partially occupied by the majority of ligands, which would still be highly scored because of their propensity to readily form geometrically accurate hydrogen bonds with solventexposed residues28, 29.
Alternatively, a pharmacophore approach enables the identification of ligands possessing functional features characteristic of an active compound, implying that they bind the target similarly to the prototypic active. Therefore, a pharmacophore may serve as an efficient filter to select ligands that are likely to bind in a similar fashion to the histone peptide in x-ray structures. Docking/scoring pharmacophore hits in the protein binding site will then play a complementary role for an accurate assessment of steric and van der Waals interactions.
The pharmacophore model was built using the crystal structure of L3MBTL1 in complex with a co-crystallized histone peptide [2RJF] (as described in Materials and Methods). Pharmacophore screening of CHEMNAV_5.9M resulted in 20,078 hits, which represents an affordable workload for the downstream docking/scoring without any additional filtering. Docking of pharmacophore hits was performed using Glide at Standard Precision as described in Materials and Methods. A total of 60,126 poses (20,055 ligands) had a G-score <0 kJ/mole. To be consistent with the rationale of a sequential pharmacophore-docking protocol, we retained only those 16,830 poses (8,947 ligands), which interact with Asp355 and Asn358 (interactions that our pharmacophore model is based upon). In order to choose a statistically significant G-score cutoff, we made use of the probability density distribution of G-score values obtained by docking a set of 10,000 decoys. These decoys were randomly selected from 334,992 commercially available compounds, having physical profiles similar to those of pharmacophore hits (i.e., one positive ionizable group, ≥2 HBA and ≥1 HBD). Our assumption was that a random selection from a broad compound set would have a distribution of G-scores characteristic of that of inactive compounds and would be indicative of the false positive rate at a given G-score value. Based on the clearly asymmetric nature of these distributions, we did not assume any analytical form and made use of a non-parametric, kernel density estimator (with Gaussian kernels). The distribution (see Figure 5) shows that inactive compounds are quite unlikely to have a G-score > 5.5 kcal/mole when interacting with the binding site of L3MBTL1 and therefore this value may be set as a threshold to select docking hits.
The 168 primary hits (with G-score > 5.5 kcal/mole) were clustered into families of structurally related compounds. Poses of the best scored representatives of each of 36 clusters were reviewed within the protein binding site. Only poses having at least 2 hydrogen bonds, in addition to the required ionic bond, with the protein were retained. Finally, 17 ligands representative of 4 clusters were selected as candidates for purchase.
In the end, a total of 51 compounds (36 resulting from 2D search and 15 from virtual screening) were actually screened against our panel of 4 MBT-containing proteins, i.e., L3MBTL1, L3MBTL3, L3MBTL4, MBTD1 (see supporting material for a complete SD file of experimentally tested samples). To make sure that even weakly active compounds are identified, all 51 compounds were submitted to dose-response AlphaScreen™ experiments in a concentration range of 5 nM to 100 μM. Nineteen of 51 tested compounds (Figure 6) demonstrated an unambiguous dose-dependent effect in this assay (Table 1). Fourteen of these experimental hits are part of the 36 compounds identified by the substructure search, while 5 come from the set of 15 pharmacophore/docking hits. Figure 7 shows the dose-response curves and structures of the most potent compounds from each hit category as well as the highest scored pose of the most potent docking hit having a pyrrolidine moiety interacting with Asp355 (in place of mono- or di-methylated ammonium). The two hit categories are complementary in terms of their potential for future chemical optimization. For example, some of the most potent 2D hits (1, 2 and 6) selectively bind to a single MBT-containing protein from our panel. However, the binding mode of these hits cannot be reliably hypothesized and many of them cannot be mapped to our pharmacophore model. Conversely, the pharmacophore/docking hits may be readily mapped to the pharmacophore and thus their binding mode to most of MBT domains may be hypothesized with high confidence. It still remains unexplained why the structure-based hits are selective to one or two of four MBT-containing proteins on our screening panel despite that they all possess a pharmacophore, which should confer an ability to bind any MBT domain. This selectivity is reassuring in the light of the future chemical optimization and its structural rationale will certainly be understood when more ample structure-activity data are available. The structure-based hits also provide evidence that Kme1 or Kme2 moieties are not the only functional groups capable of binding the MBT aromatic cage. For example, the pyrolidine-containing compound 13 is one of the most potent (IC50=17 μM) among the screened compounds. Also, compound 15, which shows some activity against L3MBTL3 (IC50=54 μM), has a rigid alkyne linker instead of a lysine-like alkane chain.
It is noteworthy that one of the substructure-search hits is Maprotiline (3), an approved drug and strong norepinephrine uptake inhibitor, also active against a broad set of aminergic G-protein coupled receptors (GPCR). Consequently, Maprotiline, in addition to its known biological properties, may also have some chromatin-related activity, although the affinity to L3MBTL1 is 3 orders of magnitude lower than the affinity to its primary target and may be of little pharmacological relevance.
The overall SAR for identified hits from both categories is quite flat (5.7 to 96 μM) and may be explained by the current binding mode hypothesis which implies that a large portion of each hit molecule is exposed to solvent. Additionally, the potency of currently identified hits is certainly insufficient to consider them as probe30 candidates and will be the subject to further chemical optimization. The upcoming optimization will target a more substantial “burying” of a ligand in the MBT binding pocket. Possible directions would include modifications of the linker between the deeply buried amino group and the outer aromatic motif as well as ortho substitutions on the outer aromatic group (e.g., ortho substituted compound 13 analogs).
In silico approaches have matured to become an established source of novel and diverse chemical tools to study and exploit the pharmaceutical potential of novel biological targets. Here we applied a combination of computational techniques in order to identify small-molecule ligands for MBT-containing proteins. MBT domains constitute a novel class of chromatin regulators, epigenetic-code “readers”, associated with chromatin condensation and gene repression, ultimately affecting processes such as differentiation, mitotic progression and tumor suppression1–3, 9.
In this report, we have made use of two parallel and complementary strategies: (i) ad hoc substructure searches for ligands possessing a lysine-like fragment, potentially resulting in structurally diverse hits with unexpected binding modes, (ii) a semi-automated sequential protocol involving 3D pharmacophores and structure-based screening to detect hits whose binding mode mimics that of endogenous ligands thus providing structural insights to subsequent potency improvements.
Both strategies produced plausible hit hypotheses leading to the purchase and experimental testing of the most promising compounds. We applied a recently developed screening technique24, making use of the AlphaScreen™ technology, to assess the potency of virtual hits against a panel of 4 MBT-containing proteins. A total of 19 tractable MBT antagonists, coming from both 2D and structure-based screening protocols, showed specific dose-dependent effects in the AlphaScreen™ assay.
After appropriate optimization, these hits may provide a basis to study the biological function as well as pharmaceutical potential of MBT-containing proteins as a new target class.
We thank Structural Genomics Consortium, Toronto for providing protein constructs. We also thank Dr. Duane Bronson for assistance with ScreenAble software. This research was supported by startup funds to SVF provided by the Carolina Partnership and by the grant RC1-GM090732-01 from the National Institutes of Health.