AMENDA is a subset of FRENDA comprising information on enzyme occurrence in organism, localization and source tissue. It includes the most reliable organism-specific enzyme information from FRENDA (AMENDA Enzyme–Organism). In addition, it comprises data on the subcellular localization of enzymes (AMENDA Localization) and the source tissues in which the enzymes are active (AMENDA Source Tissue).
Whereas the information in FRENDA is completely based on co-occurrence of enzyme names and organism names in title and abstract of a paper, for AMENDA a refined and more rigorous text-mining procedure is used (3
). The corresponding enzyme–organism combinations are stored in the new supplement AMENDA Enzyme–Organism (see ) and rated according to four reliability ranks which reflect the degree of co-occurrence of the enzyme and organism name. The best rank is assigned when both, enzyme and organism name, occur in the title plus in the same sentence of the abstract and, additionally, the EC number is found in the abstract or among the MeSH terms.
As of June 2008 AMENDA Enzyme–Organism comprises more than 225 000 organism-specific enzyme hits.
The tissues for AMENDA Source Tissue and the localizations for AMENDA are obtained from the BRENDA Tissue Ontology (http://www.obofoundry.org/cgi-bin/detail.cgi?id=brenda
) and the Gene Ontology (8
), respectively [see and (3
)]. From these two resources dictionaries are constructed which are used in addition to the enzyme and organism name dictionary that are employed for building up FRENDA and AMENDA Enzyme–Organism (see above).
Thus, the underlying PubMed reference for each enzyme- and organism-specific hit in FRENDA is further analyzed for co-occurring localization and source tissue names or synonyms. Reliability ranks are assigned to AMENDA Source Tissue and to AMENDA Localization data according to the same principles as for AMENDA Enzyme–Organism. AMENDA Source Tissue contains more than 30 000 and AMENDA Localization more than 60 000 organism-specific hits, respectively (). The same manual evaluation as mentioned above yields a precision of 75% and a recall of 25% for enzyme-organism combinations for the two highest reliability ranks (AMENDA reliability +++ and ++++). When the lower AMENDA reliability rank ++ is also included, the precision is reduced to 56%, whereas the recall increases to 55%.
As the BRENDA enzyme name dictionary contains on average approximately 10 different names per enzyme class (with several hundred names in use for some enzyme classes) the information contents in FRENDA is much more complete than a simple PubMed search using one or two names for an enzyme known to the scientist.