Search tips
Search criteria

Results 1-11 (11)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  A tale of two drug targets: the evolutionary history of BACE1 and BACE2 
Frontiers in Genetics  2013;4:293.
The beta amyloid (APP) cleaving enzyme (BACE1) has been a drug target for Alzheimer's Disease (AD) since 1999 with lead inhibitors now entering clinical trials. In 2011, the paralog, BACE2, became a new target for type II diabetes (T2DM) having been identified as a TMEM27 secretase regulating pancreatic β cell function. However, the normal roles of both enzymes are unclear. This study outlines their evolutionary history and new opportunities for functional genomics. We identified 30 homologs (UrBACEs) in basal phyla including Placozoans, Cnidarians, Choanoflagellates, Porifera, Echinoderms, Annelids, Mollusks and Ascidians (but not Ecdysozoans). UrBACEs are predominantly single copy, show 35–45% protein sequence identity with mammalian BACE1, are ~100 residues longer than cathepsin paralogs with an aspartyl protease domain flanked by a signal peptide and a C-terminal transmembrane domain. While multiple paralogs in Trichoplax and Monosiga pre-date the nervous system, duplication of the UrBACE in fish gave rise to BACE1 and BACE2 in the vertebrate lineage. The latter evolved more rapidly as the former maintained the emergent neuronal role. In mammals, Ka/Ks for BACE2 is higher than BACE1 but low ratios for both suggest purifying selection. The 5' exons show higher Ka/Ks than the catalytic section. Model organism genomes show the absence of certain BACE human substrates when the UrBACE is present. Experiments could thus reveal undiscovered substrates and roles. The human protease double-target status means that evolutionary trajectories and functional shifts associated with different substrates will have implications for the development of clinical candidates for both AD and T2DM. A rational basis for inhibition specificity ratios and assessing target-related side effects will be facilitated by a more complete picture of BACE1 and BACE2 functions informed by their evolutionary context.
PMCID: PMC3865767  PMID: 24381583
BACE1; BACE2; Alzheimer's Disease; type II diabetes; protein family evolution
2.  The IUPHAR/BPS Guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands 
Nucleic Acids Research  2013;42(Database issue):D1098-D1106.
The International Union of Basic and Clinical Pharmacology/British Pharmacological Society (IUPHAR/BPS) Guide to PHARMACOLOGY ( is a new open access resource providing pharmacological, chemical, genetic, functional and pathophysiological data on the targets of approved and experimental drugs. Created under the auspices of the IUPHAR and the BPS, the portal provides concise, peer-reviewed overviews of the key properties of a wide range of established and potential drug targets, with in-depth information for a subset of important targets. The resource is the result of curation and integration of data from the IUPHAR Database (IUPHAR-DB) and the published BPS ‘Guide to Receptors and Channels’ (GRAC) compendium. The data are derived from a global network of expert contributors, and the information is extensively linked to relevant databases, including ChEMBL, DrugBank, Ensembl, PubChem, UniProt and PubMed. Each of the ∼6000 small molecule and peptide ligands is annotated with manually curated 2D chemical structures or amino acid sequences, nomenclature and database links. Future expansion of the resource will complete the coverage of all the targets of currently approved drugs and future candidate targets, alongside educational resources to guide scientists and students in pharmacological principles and techniques.
PMCID: PMC3965070  PMID: 24234439
3.  Tracking 20 Years of Compound-to-Target Output from Literature and Patents 
PLoS ONE  2013;8(10):e77142.
The statistics of drug development output and declining yield of approved medicines has been the subject of many recent reviews. However, assessing research productivity that feeds development is more difficult. Here we utilise an extensive database of structure-activity relationships extracted from papers and patents. We have used this database to analyse published compounds cumulatively linked to nearly 4000 protein target identifiers from multiple species over the last 20 years. The compound output increases up to 2005 followed by a decline that parallels a fall in pharmaceutical patenting. Counts of protein targets have plateaued but not fallen. We extended these results by exploring compounds and targets for one large pharmaceutical company. In addition, we examined collective time course data for six individual protease targets, including average molecular weight of the compounds. We also tracked the PubMed profile of these targets to detect signals related to changes in compound output. Our results show that research compound output had decreased 35% by 2012. The major causative factor is likely to be a contraction in the global research base due to mergers and acquisitions across the pharmaceutical industry. However, this does not rule out an increasing stringency of compound quality filtration and/or patenting cost control. The number of proteins mapped to compounds on a yearly basis shows less decline, indicating the cumulative published target capacity of global research is being sustained in the region of 300 proteins for large companies. The tracking of six individual targets shows uniquely detailed patterns not discernible from cumulative snapshots. These are interpretable in terms of events related to validation and de-risking of targets that produce detectable follow-on surges in patenting. Further analysis of the type we present here can provide unique insights into the process of drug discovery based on the data it actually generates.
PMCID: PMC3812171  PMID: 24204758
4.  InChI in the wild: an assessment of InChIKey searching in Google 
While chemical databases can be queried using the InChI string and InChIKey (IK) the latter was designed for open-web searching. It is becoming increasingly effective for this since more sources enhance crawling of their websites by the Googlebot and consequent IK indexing. Searchers who use Google as an adjunct to database access may be less familiar with the advantages of using the IK as explored in this review. As an example, the IK for atorvastatin retrieves ~200 low-redundancy links from a Google search in 0.3 of a second. These include most major databases and a very low false-positive rate. Results encompass less familiar but potentially useful sources and can be extended to isomer capture by using just the skeleton layer of the IK. Google Advanced Search can be used to filter large result sets. Image searching with the IK is also effective and complementary to open-web queries. Results can be particularly useful for less-common structures as exemplified by a major metabolite of atorvastatin giving only three hits. Testing also demonstrated document-to-document and document-to-database joins via structure matching. The necessary generation of an IK from chemical names can be accomplished using open tools and resources for patents, papers, abstracts or other text sources. Active global sharing of local IK-linked information can be accomplished via surfacing in open laboratory notebooks, blogs, Twitter, figshare and other routes. While information-rich chemistry (e.g. approved drugs) can exhibit swamping and redundancy effects, the much smaller IK result sets for link-poor structures become a transformative first-pass option. The IK indexing has therefore turned Google into a de-facto open global chemical information hub by merging links to most significant sources, including over 50 million PubChem and ChemSpider records. The simplicity, specificity and speed of matching make it a useful option for biologists or others less familiar with chemical searching. However, compared to rigorously maintained major databases, users need to be circumspect about the consistency of Google results and provenance of retrieved links. In addition, community engagement may be necessary to ameliorate possible future degradation of utility.
PMCID: PMC3598674  PMID: 23399051
InChI; InChIKey; Databases; Google; Chemical structures; Patents; PubChem; ChemSpider
5.  SARConnect: A Tool to Interrogate the Connectivity Between Proteins, Chemical Structures and Activity Data 
Molecular Informatics  2012;31(8):555-568.
The access and use of large-scale structure-activity relationships (SAR) is increasing as the range of targets and availability of bioactive compound-to-protein mappings expands. However, effective exploitation requires merging and normalisation of activity data, mappings to target classifications as well as visual display of chemical structure relationships. This work describes the development of the application “SARConnect” to address these issues. We discuss options for delivery and analysis of large-scale SAR data together with a set of use-cases to illustrate the design choices and utility. The main activity sources of ChEMBL,1 GOSTAR2 and AstraZeneca’s internal system IBIS, had already been integrated in Chemistry Connect.3 For target relationships we selected human UniProtKB/Swiss-Prot4 as our primary source of a heuristic target classification. Similarly, to explore chemical relationships we combined several methods for framework and scaffold analysis into a unified, hierarchical classification where ease of navigation was the primary goal. An application was built on TIBCO Spotfire to retrieve data for visual display. Consequently, users can explore relationships between target, activity and structure across internal, external and commercial sources that encompass approximately 3 million compounds, 2000 human proteins and 10 million activity values. Examples showing the utility of the application are given.
PMCID: PMC3535785  PMID: 23308082
Proteins; Activity data; Structure-activity relationships (SAR); Chemical structures
6.  Analysis of in vitro bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds 
Since the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that define the area between these two extremes are sparse because of the challenges of collating published information at the necessary scale. We have addressed this by interrogating databases, populated by expert curation, of bioactivity data extracted from patents and journal papers over the last 30 years.
From a subset of just over 27,000 documents we have extracted a set of compound-to-target relationships for biochemical in vitro binding-type assay data for 1,736 human proteins and 1,654 gene identifiers. These are linked to 1,671,951 compound records derived from 823,179 unique chemical structures. The distribution showed a compounds-per-target average of 964 with a maximum of 42,869 (Factor Xa). The list includes non-targets, failed targets and cross-screening targets. The top-278 most actively pursued targets cover 90% of the compounds. We further investigated target ranking by determining the number of molecular frameworks and scaffolds. These were compared to the compound counts as alternative measures of chemical diversity on a per-target basis.
The compounds-per-protein listing generated in this work (provided as a supplementary file) represents the major proportion of the human drug target landscape defined by published data. We supplemented the simple ranking by the number of compounds assayed with additional rankings by molecular topology. These showed significant differences and provide complementary assessments of chemical tractability.
PMCID: PMC3118229  PMID: 21569515
7.  Towards BioDBcore: a community-defined information specification for biological databases 
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
PMCID: PMC3017395  PMID: 21205783
8.  Towards BioDBcore: a community-defined information specification for biological databases 
Nucleic Acids Research  2010;39(Database issue):D7-D10.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
PMCID: PMC3013734  PMID: 21097465
9.  Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds 
Since 2004 public cheminformatic databases and their collective functionality for exploring relationships between compounds, protein sequences, literature and assay data have advanced dramatically. In parallel, commercial sources that extract and curate such relationships from journals and patents have also been expanding. This work updates a previous comparative study of databases chosen because of their bioactive content, availability of downloads and facility to select informative subsets.
Where they could be calculated, extracted compounds-per-journal article were in the range of 12 to 19 but compound-per-protein counts increased with document numbers. Chemical structure filtration to facilitate standardised comparisons typically reduced source counts by between 5% and 30%. The pair-wise overlaps between 23 databases and subsets were determined, as well as changes between 2006 and 2008. While all compound sets have increased, PubChem has doubled to 14.2 million. The 2008 comparison matrix shows not only overlap but also unique content across all sources. Many of the detailed differences could be attributed to individual strategies for data selection and extraction. While there was a big increase in patent-derived structures entering PubChem since 2006, GVKBIO contains over 0.8 million unique structures from this source. Venn diagrams showed extensive overlap between compounds extracted by independent expert curation from journals by GVKBIO, WOMBAT (both commercial) and BindingDB (public) but each included unique content. In contrast, the approved drug collections from GVKBIO, MDDR (commercial) and DrugBank (public) showed surprisingly low overlap. Aggregating all commercial sources established that while 1 million compounds overlapped with PubChem 1.2 million did not.
On the basis of chemical structure content per se public sources have covered an increasing proportion of commercial databases over the last two years. However, commercial products included in this study provide links between compounds and information from patents and journals at a larger scale than current public efforts. They also continue to capture a significant proportion of unique content. Our results thus demonstrate not only an encouraging overall expansion of data-supported bioactive chemical space but also that both commercial and public sources are complementary for its exploration.
PMCID: PMC3225862  PMID: 20298516
10.  InterPro (The Integrated Resource of Protein Domains and Functional Sites) 
Yeast (Chichester, England)  2000;17(4):327-334.
The family and motif databases, PROSITE, PRINTS, Pfam and ProDom, have been integrated into a powerful resource for protein secondary annotation. As of June 2000, InterPro had processed 384 572 proteins in SWISS-PROT and TrEMBL. Because the contributing databases have different clustering principles and scoring sensitivities, the combined assignments compliment each other for grouping protein families and delineating domains. The graphic displays of all matches above the scoring thresholds enables judgements to be made on the concordances or differences between the assignments. The website links can be used to analyse novel sequences and for queries across the proteomes of 32 organisms, including the partial human set, by domain and/or protein family. An analysis of selected HtrA/DegQ proteases demonstrates the utility of this website for detailed comparative genomics. Further information on the project can be found at the European Bioinformatics Institute at
PMCID: PMC2448387  PMID: 11119311
11.  Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database 
Molecular Informatics  2013;32(11-12):881-897.
ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database are resources of curated chemistry-to-protein relationships widely used in the chemogenomic arena. In this work we have extended an earlier analysis (PMID 22821596) by comparing chemistry and protein target content between 2010 and 2013. For the former, details are presented for overlaps and differences, statistics of stereochemistry as well as stereo representation and MW profiles between the four databases. For 2013 our results indicate quality improvements, major expansion, increased achiral structures and changes in MW distributions. An orthogonal comparison of chemical content with different sources inside PubChem highlights further interpretable differences. Expansion of protein content by UniProt IDs is also recorded for 2013 and Gene Ontology comparisons for human-only sets indicate differences. These emphasise the expanding complementarity of chemistry-to-protein relationships between sources, although different criteria are used for their capture.
PMCID: PMC3916886  PMID: 24533037
Compounds; Proteins; Drugs; Drug targets; Databases; InChI

Results 1-11 (11)