Drug-target relationships described in SuperTarget
were obtained in three different ways. Starting with 2400 drugs and their synonyms from the SuperDrug Database (5
), the text mining tool EbiMed (6
) was used to extract relevant text passages containing potential drug-target relations from about 15 millions public abstracts listed in PubMed. Many thousands of false positive or irrelevant relations were eliminated by manual curation.
In parallel, potential drug-target relations were automatically extracted from Medline by searching for synonyms of drugs, proteins and Medical Subject Headings (MeSH terms) describing groups of proteins (7
). MeSH terms were used to capture and down-weight interactions that are not explicitly described in the abstracts e.g. for protein families or protein complexes. In the case of families, the specific interacting family member might not be known yet, whereas in the case of complexes, the drug might interact with more than one subunit. Proteins associated to MeSH terms were assigned by a semi-automated procedure relying on mappings provided by MeSH and synonyms of proteins that are aggregated in the STRING resource (8
). Proteins that were often mentioned in abstracts, but could not be automatically assigned to families, were manually assigned. Depending on the size and nature of the families, the confidence of an interaction between drugs and individual proteins was decreased. More heterogeneous families are assigned a lower confidence. The most probable candidates were identified using a benchmarking scheme (8
) and manually curated.
In a last step, relations from other databases, namely DrugBank (3
), KEGG (9
), PDB (10
), SuperLigands (11
) and TTD (4
), were checked for drug-target interactions not identified with the preceding steps. If those interactions could be confirmed by literature listed in PubMed, the references were included in SuperTarget
otherwise the describing database is referenced.
In consideration of the large number of entries we cannot rule out that some of the data is erroneous, change over time or is too unspecific. In the case of doubt we refer to the referenced relation source.
To be able to obtain more information on the drug-target relations, SuperTarget
provides links to physicochemical properties and further structural information of drugs. Proven or potential target proteins are represented by sequences as stored in UniProt (12
), by functional annotations extracted from GOA (13
) and by related pathway information provided by KEGG (9
) (compare ). Adverse drug reactions were extracted from the free accessible Canadian Adverse Reaction Monitoring Program (CADRMP, http://www.hc-sc.gc.ca/
Figure 1. System architecture and number of database entries of SuperTarget. The database contains the complete Uniprot with more than 3 million entries. Beside the targets, drugs and pathways the database provides 23 000 different GO-terms and 30 000 links to (more ...)
For a subset of the drug-target relations, namely those where our text-mining approach indicated a wealth of additional information, the type of binding was further analyzed and direct and indirect interactions were manually distinguished. Indirect interactions can, for example, be caused by active metabolites of the drug or by changes in the expression of a protein. The extensively annotated subset, which is contained in Matador should be well-suited as training set for various large-scale discovery approaches.