Fundamentally, DrugBank is a dual purpose bioinformatics–cheminformatics database with a strong focus on quantitative, analytic or molecular-scale information about both drugs and drug targets. In many respects it combines the data-rich molecular biology content normally found in curated sequence databases such as Swiss-Prot and UniProt (6
) with the equally rich data found in medicinal chemistry textbooks and chemical reference handbooks. By bringing these two disparate types of information together into one unified and freely available resource, we wanted to allow educators and researchers from diverse disciplines and backgrounds (academic, industrial, clinical, non-clinical) to conduct the type of in silico
learning and discovery that is now routine in the world of genomics and proteomics.
The diversity of data types and the required breadth of domain knowledge, combined with the fact that the data were mostly ‘paper-bound’ made the assembly of DrugBank both difficult and time-consuming. To compile, confirm and validate this comprehensive collection of data, more than a dozen textbooks, several hundred journal articles, nearly 30 different electronic databases, and at least 20 in-house or web-based programs were individually searched, accessed, compared, written or run over the course of four years. The team of DrugBank archivists and annotators included two accredited pharmacists, a physician and three bioinformaticians with dual training in computing science and molecular biology/chemistry.
DrugBank currently contains >4100 drug entries, corresponding to >12
000 different trade names and synonyms. These drug entries were chosen according to the following rules: the molecule must contain more than one type of atom, be non-redundant, have a known chemical structure and be identified as a drug or drug-like molecule by at least one reputable data source. To facilitate more targeted research and exploration, DrugBank is divided into four major categories: (i) FDA-approved small molecule drugs (>700 entries), (ii) FDA-approved biotech (protein/peptide) drugs (>100 entries), (iii) nutraceuticals or micronutrients such as vitamins and metabolites (>60 entries) and (iv) experimental drugs, including unapproved drugs, de-listed drugs, illicit drugs, enzyme inhibitors and potential toxins (3200 entries). These individual ‘Drug Types’ are also bundled into two larger categories including all FDA drugs (Approved Drugs) and All Compounds (Experimental + FDA + nutraceuticals). DrugBank's coverage for non-trivial FDA-approved drugs is ~80% complete. In addition, >14
000 protein (i.e. drug target) sequences are linked to these drug entries. More complete information about the numbers of drugs, drug targets and non-redundant drug targets (including their sequences) is available in the DrugBank ‘download’ page. The entire database, including text, sequence, structure and image data occupies nearly 16 gigabytes of data—most of which can be freely downloaded.
DrugBank is a fully searchable web-enabled resource with many built-in tools and features for viewing, sorting and extracting drug or drug target data. Detailed instructions on where to locate and how to use these browsing/search tools are provided on the DrugBank homepage. As with any web-enabled database, DrugBank supports standard text queries (through the text search box located on the home page). It also offers general database browsing using the ‘Browse’ and ‘PharmaBrowse’ buttons located at the top of each DrugBank page. To facilitate general browsing, DrugBank is divided into synoptic summary tables which, in turn, are linked to more detailed ‘DrugCards’—in analogy to the very successful GeneCards concept (7
). All of DrugBank's summary tables can be rapidly browsed, sorted or reformatted (using up to six different criteria) in a manner similar to the way PubMed abstracts may be viewed. Clicking on the DrugCard button found in the leftmost column of any given DrugBank summary table opens a webpage describing the drug of interest in much greater detail. Each DrugCard entry contains >80 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data (see ). In addition to providing comprehensive numeric, sequence and textual data, each DrugCard also contains hyperlinks to other databases, abstracts, digital images and interactive applets for viewing molecular structures (). In addition to the general browsing features, DrugBank also provides a more specialized ‘PharmBrowse’ feature. This is designed for pharmacists, physicians and medicinal chemists who tend to think of drugs in clusters of indications or drug classes. This particular browsing tool provides navigation hyperlinks to >70 drug classes, which in turn list the FDA-approved drugs associated with the drugs. Each drug name is then linked to its respective DrugCard.
Summary of the data fields or data types found in each DrugCard
A screenshot montage of the DrugBank Database showing several possible views of information describing the drug Ramipril. Not all fields are shown.
A key distinguishing feature of DrugBank from other on-line drug resources is its extensive support for higher level database searching and selecting functions. In addition to the data viewing and sorting features already described, DrugBank also offers a local BLAST (8
) search that supports both single and multiple sequence queries, a boolean text search [using GLIMPSE; (9
)], a chemical structure search utility and a relational data extraction tool (10
). These can all be accessed via the database navigation bar located at the top of every DrugBank page.
The BLAST search (SeqSearch) is particularly useful as it can potentially allow users to quickly and simply identify drug leads from newly sequenced pathogens. Specifically, a new sequence, a group of sequences or even an entire proteome can be searched against DrugBank's database of known drug target sequences by pasting the FASTA formatted sequence (or sequences) into the SeqSearch query box and pressing the ‘submit’ button. A significant hit reveals, through the associated DrugCard hyperlink, the name(s) or chemical structure(s) of potential drug leads that may act on that query protein (or proteome).
DrugBank's structure similarity search tool (ChemQuery) can be used in a similar manner to its sequence search tools. Users may sketch (through ACD's freely available chemical sketching applet) or paste a SMILES string (11
) of a possible lead compound into the ChemQuery window. Submitting the query launches a structure similarity search tool that looks for common substructures from the query compound that match DrugBank's database of known drug or drug-like compounds. High scoring hits are presented in a tabular format with hyperlinks to the corresponding DrugCards (which in turn links to the protein target). The ChemQuery tool allows users to quickly determine whether their compound of interest acts on the desired protein target. This kind of chemical structure search may also reveal whether the compound of interest may unexpectedly interact with unintended protein targets. In addition to these structure similarity searches, the ChemQuery utility also supports compound searches on the basis of chemical formula and molecular weight ranges.
DrugBank's data extraction utility (Data Extractor) employs a simple relational database system that allows users to select one or more data fields and to search for ranges, occurrences or partial occurrences of words, strings or numbers. The data extractor uses clickable web forms so that users may intuitively construct SQL-like queries. Using a few mouse clicks, it is relatively simple to construct very complex queries (‘find all drugs less than 600 daltons with LogPs less than 3.2 that are antihistamines’) or to build a series of highly customized tables. The output from these queries is provided as an HTML format with hyperlinks to all associated DrugCards.