|Home | About | Journals | Submit | Contact Us | Français|
Consideration of biomolecules in terms of their molecular building blocks provides valuable new information regarding their synthesis, degradation and similarity. Here, we present the FragmentStore, a resource for the comparison of fragments found in metabolites, drugs or toxic compounds. Starting from 13000 metabolites, 16000 drugs and 2200 toxic compounds we generated 35000 different building blocks (fragments), which are not only relevant to their biosynthesis and degradation but also provide important information regarding side-effects and toxicity. The FragmentStore provides a variety of search options such as 2D structure, molecular weight, rotatable bonds, etc. Various analysis tools have been implemented including the calculation of amino acid preferences of fragments’ binding sites, classification of fragments based on the enzyme classification class of the enzyme(s) they bind to and small molecule library generation via a fragment-assembler tool. Using the FragmentStore, it is now possible to identify the common fragments of different classes of molecules and generate hypotheses about the effects of such intersections. For instance, the co-occurrence of fragments in different drugs may indicate similar targets and possible off-target interactions whereas the co-occurrence of fragments in a drug and a toxic compound/metabolite could be indicative of side-effects. The database is publicly available at:http://bioinformatics.charite.de/fragment_store.
Across all kingdoms of life, biomolecules are formed from molecular building blocks, suggesting that this principle has been favoured during evolution. During metabolism, the building block nature of biomolecules facilitates their degradation into fragments. A systems biology view of metabolism would benefit from considering these fragments. For instance, a study investigating the set of metabolites available to different organisms found that common substructures were observed within the uniquely used compounds in metabolic pathways, indicating that there are metabolite-mediated relationships between different organism groups(1).
During the past few decades, relatively few drugs have reached the market due to high failure rates at the clinical testing stage (2). The two main causes of these failures are lack of efficacy and toxicity (3). In fact, one-third of potential therapeutic compounds fail in clinical trials or are removed from the market at a later stage due to unacceptable side-effects, often caused by the drug binding to an off-target (4). Such drug polypharmacology has driven the prediction and characterization of drug-target associations in order to identify possible side-effects of drugs and identify new opportunities for therapeutic intervention (5–8). Fragment-based drug design is a well-established approach that has led to the successful development of novel leads for many different targets (9). However, fragment-based approaches can also be employed to perform small molecule building-block analyses for the identification of fragments responsible for off-target binding and side-effects (10). The Biochemical Substructure Search Catalogue (BiSSCat) stores computationally constructed substructures of compounds and can be used to determine possible additional substrates for enzymes (11). Identification of fragments that have a role in toxicity, side-effects or that mediate off-target interactions would aid the development of safe and effective medical drugs. Furthermore, such analyses could help to improve future toxicity testing in line with the US National Academy of Sciences’ recommendations to increase efficiency and decrease animal usage (12).
Comparison of the fragments present in different classes of small molecules such as metabolites, toxic compounds and drugs facilitates the answering of questions such as: (i) how many common fragments are there in the different classes of molecules? (ii) How does the synthesis and/or degradation of metabolites depend on their fragment composition? (iii) Do those molecules containing toxic fragments cause more side-effects? (iv) Can knowledge about common fragments in small molecules help optimize drug polypharmacology? (v) Can side-effects of drugs be rationalized through the identification of common fragments with metabolites? To help researchers answer such questions we developed the FragmentStore database which consists of more than 35000 fragments and property data such as physicochemical information and binding site preferences.
The FragmentStore database consists of more than 35000 different fragments resulting from fragmentation of more than 13000 metabolic compounds, 2200 toxic compounds and 16000 drugs and pharmacologically characterized compounds using two different fragmentation strategies: (i) the compounds were recursively fragmented according to the recap-rules and (ii) chains between ring structures were cut out.
For completeness, the compounds were also recursively fragmented according to their rotatable bonds, which alone resulted in more than 150000 fragments. Properties such as molecular weight, logP and hydrophobicity are stored for all fragments. Furthermore, binding site preferences were determined for each fragment using all structures in the Protein Data Bank (PDB) (13) bound to a ligand of which the fragment is part of (if at least one crystallized structure is available in the PDB). These binding site preferences are calculated based on the frequency of amino acid types binding a fragment compared to the amino acid frequencies for the entire protein surface. The amino acid binding site preferences for each fragment are displayed in a histogram with one bar for each amino acid, thus allowing users to ascertain whether there are particular patterns of amino acids responsible for binding particular fragments. Moreover, identical fragments in different binding sites are superimposed and shown with the amino acids that form the binding pocket. These superimpositions provide detailed information about the mechanism of fragment binding and provide valuable information about the specificity of interaction in both homologous and non-homologous proteins.
FragmentStore offers various ways of searching the database:
We have also implemented a fragment-assembler tool, which allows users to build a library of small molecules based on the selection of fragments of their choice using reverse recap-rules.
More than 35000 fragments were generated by fragmenting three different compound libraries comprising metabolites, toxic molecules and drugs. More than 13000 KEGG-metabolites (15) were fragmented for the metabolite dataset and more than 2200 compounds from the SuperToxic database (16) were fragmented for the toxic dataset.
Altogether, the drug dataset consists of fragments from more than 16000 unique drugs from the following resources: SuperDrug (~2400 drugs) (16), KEGG-drugs (~7000) (15,17), DrugBank’s approved drugs (~1300) (18), WDI drugs (Derwent World Drug Index) (~7000) and CMC drugs (MDL) (~8000). For the last two databases, we consider only drugs, which are publicly available, e.g. in PubChem. The fragments have direct links to the compounds of SuperDrug, KEGG-drugs, DrugBank, SuperToxic and KEGG-metabolites.
The ligands in the above-mentioned datasets were fragmented using three different strategies. For the first strategy, the compounds were fragmented recursively using the recap-rules (19). The recap methodology helps to identify fragments which are useful for combinatorial chemistry. This fragmentation method allows libraries to be generated which contain fragments that can be easily connected by bonds that are easy to synthesize e.g. ester bonds. Altogether, the recap-rules comprise eleven different bonds: amide, ester, amine, urea, ether, olefin, quaternary nitrogen, bond between aromatic nitrogen and carbon, lactam-nitrogen and carbon, bonds between aromatic rings and sulphonamide.
For the second strategy, chains between two ring structures are cut out. Due to its non-redundancy, this fragment library is more suitable for statistical analysis than libraries generated using recursive methods. For the third strategy, the ligands were recursively fragmented by their rotatable bonds. The latter fragmentation rule produced the most fragments (see Figure 1).
To validate the fragments for inclusion into FragmentStore, a modification of the Lipinski rule-of-five was used. Astex Technology’s rule-of-three is useful for constructing fragment libraries that are efficient for lead generation (20). The rule-of-three criteria for fragments (which can later be combined into compounds) are that they should have a molecular weight no more than 300 g/mol and that the number of hydrogen bond donors, the number of hydrogen acceptors and the clogP-value should not be more than three. Additionally, two properties should also be considered during the selection of fragments for building the fragment library: the number of rotatable bonds has to be less than four and the polar surface area has to be at most 60. For inclusion into the FragmentStore database, the fragments are only allowed to break one of these rules. Furthermore, every fragment in the FragmentStore consists of at least three heavy atoms.
After fragmentation, the binding site preferences for the fragments were calculated for all fragments which are co-crystallized in the Protein Data Bank. For each amino acid, the frequency of occurrence is calculated at the fragment’s binding site and compared to the frequency of occurrence at the protein’s surface. The binding site of a fragment is defined as all amino acids within 5Å of the fragment. FragmentStore provides these binding site preferences as bar charts. The fragments which were co-crystallized in more than one different protein structure were superimposed using the superimposition function of PyMOL (21). The superimposed fragments and its binding sites are visualized using Jmol.
We have also implemented a fragment-assembler tool, which allows users to build a library of small molecules based on the selection of fragments of their choice using reverse recap-rules. The user is allowed to choose up to three fragments, which are combined to make new compounds that satisfy Lipinski’s rule-of-five (22). This rule defines properties, which compounds should fulfil to become drug candidates. This rule claims that an orally available drug has no more than five hydrogen bond donors, no more than ten hydrogen bond acceptors, a molecular weight of <500g/mol and the LogP-value, which gives information about the lipophilicity of a molecule and is defined as the logarithm of the 1-octanol/water partition, should be below five.
As fragment assembly is computationally expensive, the user is sent the results (in SMILES format) by email within 20min. The FragmentStore also provides an example set of fragments that can be used to demonstrate the capabilities of the fragment-assembler.
In order to search for fragments using structural features, bit vector `structural fingerprints’, which encode chemical and topological characteristics of a molecule, were included. The structural fingerprint was implemented using Open Babel (http://openbabel.sourceforge.net/), which offers four different fingerprints (FP2, FP3, FP4, MACCS).
Fingerprint 2 (http://openbabel.org/wiki/FP2) is widely used for the comparison of small molecules and is path- based and indexes linear fragments up to seven atoms. However, this fingerprint is not optimal for the comparison of small fragments. To provide an optimal comparison of small fragments, a combination of fingerprint 2 and 4 (FP2, FP4) was used. Fingerprint 4 (http://openbabel.org/wiki/FP4) is based on a set of SMARTS patterns and also considers functional groups. The combined fingerprint shows the best results in comparing fragments.
This combined structural fingerprint is pre-calculated for all fragments in the database and will be calculated for the query fragments to compare it to the entries of FragmentStore. For the similarity search the Tanimoto coefficient is used, which gives values in the range of zero (no bits in common) to unity (all bits the same).
FragmentStore is designed as a relational database on a MySQL server. Additionally, the MyChem package (http://mychem.sourceforge.net/) is installed to provide a complete set of functions for handling chemical data within MySQL. Most of the functions used by MyChem depend upon Open Babel. The structural fingerprint is implemented in Open Babel 2.2.3 (http://openbabel.sourceforge.net/). To allow the upload or drawing of a query structure, the Marvin Sketch plugin (http://www.chemaxon.com) was installed. For the visualisation of the 3D structures Jmol (http://www.jmol.org/) was installed. The website is built with php and web access is enabled via Apache HTTP Server 2.2.
If one wants to find a ligand for a specific binding site of a target, the first step could be the characterisation of the pocket. Afterwards, fragments for the specific binding site can be detected in FragmentStore. If, for example, one part of the binding site consists of many hydrophobic amino acids like methionine, the user is able to search the FragmentStore database for fragments which have hydrophobic binding site preferences. Beside the fragment and its physicochemical properties, the user gets the binding site preference as a bar plot. Furthermore, the binding sites in which the fragment occurs are superimposed and displayed in 3D using Jmol (Figure 2).
In the following analysis we consider the fragments which are produced after fragmenting the SuperDrug drugs, the KEGG-metabolites and the highly toxic compounds from the SuperToxic database using the linker strategy. All ligands common to the SuperDrug and SuperToxic datasets were excluded from the latter dataset and all ligands with e.g. a ‘R’- or ‘*’-atom were excluded from the KEGG-metabolites. The intersection of the three ligand and resulting fragment datasets are shown in Figure 3. Only a small number of fragments are shared between all three classes of molecules. These fragments tend to be very small and are probably not essential for the compound’s specificity. As one would expect, there are proportionally less fragments shared between the toxic and metabolite fragments in comparison to those shared between the drugs and metabolite fragments. Surprisingly, although the toxic and drug datasets have no common compounds, the datasets share many similar fragments. These may contribute to the toxic effect of drugs and even side-effects. Figure 3 shows an example of a fragment which only occurs in the toxic and drug dataset but not in the KEGG-metabolites. The fragment is part of the chemotherapeutic drug, Prednimustine (23) and of several toxic compounds, e.g. 4′-(di-2″-chloroethylamino)-4-hydroxy-3-methyldiphenylamine. The compound 4′-(di-2″-chloroethylamino)-4-hydroxy-3-methyldiphenylamine was shown to have an LD50 value of 1.43mg/kg (i.p.) in rat (24) and is therefore highly toxic.
The FragmentStore provides data on fragments from drugs, metabolites and toxic compounds. A fragmentation method should consider synthetic rules and distinguish between linkers and fragments. Co-occurrence of fragments in different drugs may indicate similar (off–) targets and the co-occurrence of fragments in drugs and toxic compounds or metabolites could be indicative for side effects.
The systematic (computational) synthesis of libraries from three fragments, as provided by the fragment-assembler in FragmentStore, leads on average to 10000 compounds, which would be reasonable to sample the chemical space of a particular medical target.
A future goal of the FragmentStore is a mapping of all fragments onto metabolic and signaling pathways, hopefully elucidating interrelations between fragments, drugs, targets and therapeutic effects. For the mapping we will consider subtle changes and stereochemistry between the enzymatic steps of metabolic pathways. In a next step in-depth analysis will be carried out regarding the compounds acting on different receptors in the signaling cascades. The result will be a distribution of fragments/scaffolds over certain regions of regulation—such as particular kinases or neuronal receptors that might explain effects like multi-specificity.
The FragmentStore database is freely available under the URL: http://bioinformatics.charite.de/fragment_store/ and will be updated regularly.
This work was supported by Deutsche Forschungsgemeinschaft (SFB 449), International Research Training Group (IRTG) Berlin–Boston–Kyoto, Bundesministerium für Bildung und Forschung (BMBF) and European Union (EU). Funding for open access charge: DFG, BMBF and EU.
Conflict of interest statement. None declared.
The authors would like to thank B. Grüning for help with MyChem and database setup and U. Schmidt for her support with the toxicity data and the figures. They would also like to thank A. Chefai for her support in developing the fragment-assembler.