Search tips
Search criteria

Results 1-5 (5)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
author:("akata, Maria")
1.  Dynamic enhancement of drug product labels to support drug safety, efficacy, and effectiveness 
Out-of-date or incomplete drug product labeling information may increase the risk of otherwise preventable adverse drug events. In recognition of these concerns, the United States Federal Drug Administration (FDA) requires drug product labels to include specific information. Unfortunately, several studies have found that drug product labeling fails to keep current with the scientific literature. We present a novel approach to addressing this issue. The primary goal of this novel approach is to better meet the information needs of persons who consult the drug product label for information on a drug’s efficacy, effectiveness, and safety. Using FDA product label regulations as a guide, the approach links drug claims present in drug information sources available on the Semantic Web with specific product label sections. Here we report on pilot work that establishes the baseline performance characteristics of a proof-of-concept system implementing the novel approach. Claims from three drug information sources were linked to the Clinical Studies, Drug Interactions, and Clinical Pharmacology sections of the labels for drug products that contain one of 29 psychotropic drugs. The resulting Linked Data set maps 409 efficacy/effectiveness study results, 784 drug-drug interactions, and 112 metabolic pathway assertions derived from three clinically-oriented drug information sources (, the National Drug File – Reference Terminology, and the Drug Interaction Knowledge Base) to the sections of 1,102 product labels. Proof-of-concept web pages were created for all 1,102 drug product labels that demonstrate one possible approach to presenting information that dynamically enhances drug product labeling. We found that approximately one in five efficacy/effectiveness claims were relevant to the Clinical Studies section of a psychotropic drug product, with most relevant claims providing new information. We also identified several cases where all of the drug-drug interaction claims linked to the Drug Interactions section for a drug were potentially novel. The baseline performance characteristics of the proof-of-concept will enable further technical and user-centered research on robust methods for scaling the approach to the many thousands of product labels currently on the market.
PMCID: PMC3698101  PMID: 23351881
Regulatory science; Drug information services; Drug labeling; Linked data; Scientific discourse ontologies; Drug interactions; Pharmacokinetics; Treatment efficacy; Treatment effectiveness; Comparative effectiveness research
2.  Automatic recognition of conceptualization zones in scientific articles and two life science applications 
Bioinformatics  2012;28(7):991-1000.
Motivation: Scholarly biomedical publications report on the findings of a research investigation. Scientists use a well-established discourse structure to relate their work to the state of the art, express their own motivation and hypotheses and report on their methods, results and conclusions. In previous work, we have proposed ways to explicitly annotate the structure of scientific investigations in scholarly publications. Here we present the means to facilitate automatic access to the scientific discourse of articles by automating the recognition of 11 categories at the sentence level, which we call Core Scientific Concepts (CoreSCs). These include: Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model, Observation, Result and Conclusion. CoreSCs provide the structure and context to all statements and relations within an article and their automatic recognition can greatly facilitate biomedical information extraction by characterizing the different types of facts, hypotheses and evidence available in a scientific publication.
Results: We have trained and compared machine learning classifiers (support vector machines and conditional random fields) on a corpus of 265 full articles in biochemistry and chemistry to automatically recognize CoreSCs. We have evaluated our automatic classifications against a manually annotated gold standard, and have achieved promising accuracies with ‘Experiment’, ‘Background’ and ‘Model’ being the categories with the highest F1-scores (76%, 62% and 53%, respectively). We have analysed the task of CoreSC annotation both from a sentence classification as well as sequence labelling perspective and we present a detailed feature evaluation. The most discriminative features are local sentence features such as unigrams, bigrams and grammatical dependencies while features encoding the document structure, such as section headings, also play an important role for some of the categories. We discuss the usefulness of automatically generated CoreSCs in two biomedical applications as well as work in progress.
Availability: A web-based tool for the automatic annotation of articles with CoreSCs and corresponding documentation is available online at also contains detailed information pertaining to CoreSC annotation and links to annotation guidelines as well as a corpus of manually annotated articles, which served as our training data.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3315721  PMID: 22321698
3.  On the formalization and reuse of scientific research 
The reuse of scientific knowledge obtained from one investigation in another investigation is basic to the advance of science. Scientific investigations should therefore be recorded in ways that promote the reuse of the knowledge they generate. The use of logical formalisms to describe scientific knowledge has potential advantages in facilitating such reuse. Here, we propose a formal framework for using logical formalisms to promote reuse. We demonstrate the utility of this framework by using it in a worked example from biology: demonstrating cycles of investigation formalization [F] and reuse [R] to generate new knowledge. We first used logic to formally describe a Robot scientist investigation into yeast (Saccharomyces cerevisiae) functional genomics [f1]. With Robot scientists, unlike human scientists, the production of comprehensive metadata about their investigations is a natural by-product of the way they work. We then demonstrated how this formalism enabled the reuse of the research in investigating yeast phenotypes [r1 = R(f1)]. This investigation found that the removal of non-essential enzymes generally resulted in enhanced growth. The phenotype investigation was then formally described using the same logical formalism as the functional genomics investigation [f2 = F(r1)]. We then demonstrated how this formalism enabled the reuse of the phenotype investigation to investigate yeast systems-biology modelling [r2 = R(f2)]. This investigation found that yeast flux-balance analysis models fail to predict the observed changes in growth. Finally, the systems biology investigation was formalized for reuse in future investigations [f3 = F(r2)]. These cycles of reuse are a model for the general reuse of scientific knowledge.
PMCID: PMC3163424  PMID: 21490004
semantic web; logic; Saccharomyces cerevisiae; ontology
4.  A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment 
BMC Bioinformatics  2011;12:69.
Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. For example, a simple section-based scheme assigns individual sentences in abstracts under sections such as Objective, Methods, Results and Conclusions. Some schemes of textual information structure have proved useful for biomedical text mining (BIO-TM) tasks (e.g. automatic summarization). However, user-centered evaluation in the context of real-life tasks has been lacking.
We take three schemes of different type and granularity - those based on section names, Argumentative Zones (AZ) and Core Scientific Concepts (CoreSC) - and evaluate their usefulness for a real-life task which focuses on biomedical abstracts: Cancer Risk Assessment (CRA). We annotate a corpus of CRA abstracts according to each scheme, develop classifiers for automatic identification of the schemes in abstracts, and evaluate both the manual and automatic classifications directly as well as in the context of CRA.
Our results show that for each scheme, the majority of categories appear in abstracts, although two of the schemes (AZ and CoreSC) were developed originally for full journal articles. All the schemes can be identified in abstracts relatively reliably using machine learning. Moreover, when cancer risk assessors are presented with scheme annotated abstracts, they find relevant information significantly faster than when presented with unannotated abstracts, even when the annotations are produced using an automatic classifier. Interestingly, in this user-based evaluation the coarse-grained scheme based on section names proved nearly as useful for CRA as the finest-grained CoreSC scheme.
We have shown that existing schemes aimed at capturing information structure of scientific documents can be applied to biomedical abstracts and can be identified in them automatically with an accuracy which is high enough to benefit a real-life task in biomedicine.
PMCID: PMC3060841  PMID: 21385430
5.  Towards Robot Scientists for autonomous scientific discovery 
We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist.
PMCID: PMC2813846  PMID: 20119518

Results 1-5 (5)