PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1088246)

Clipboard (0)
None

Related Articles

1.  Benchmarking Ontologies: Bigger or Better? 
PLoS Computational Biology  2011;7(1):e1001055.
A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them.
Author Summary
An ontology represents the concepts and their interrelation within a knowledge domain. Several ontologies have been developed in biomedicine, which provide standardized vocabularies to describe diseases, genes and gene products, physiological phenotypes, anatomical structures, and many other phenomena. Scientists use them to encode the results of complex experiments and observations and to perform integrative analysis to discover new knowledge. A remaining challenge in ontology development is how to evaluate an ontology's representation of knowledge within its scientific domain. Building on classic measures from information retrieval, we introduce a family of metrics including breadth and depth that capture the conceptual coverage and parsimony of an ontology. We test these measures using (1) four commonly used medical ontologies in relation to a corpus of medical documents and (2) seven popular English thesauri (ontologies of synonyms) with respect to text from medicine, news, and novels. Results demonstrate that both medical ontologies and English thesauri have a small overlap in concepts and relations. Our methods suggest efforts to tighten the fit between ontologies and biomedical knowledge.
doi:10.1371/journal.pcbi.1001055
PMCID: PMC3020923  PMID: 21249231
2.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation 
PLoS Biology  2009;7(11):e1000247.
A novel method for quantifying the similarity between phenotypes by the use of ontologies can be used to search for candidate genes, pathway members, and human disease models on the basis of phenotypes alone.
Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ) methodology, wherein the affected entity (E) and how it is affected (Q) are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM). These human annotations were loaded into our Ontology-Based Database (OBD) along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify gene candidates and animal models of human disease, which may shorten the lengthy path to identification and understanding of the genetic basis of human disease.
Author Summary
Model organisms such as fruit flies, mice, and zebrafish are useful for investigating gene function because they are easy to grow, dissect, and genetically manipulate in the laboratory. By examining mutations in these organisms, one can identify candidate genes that cause disease in humans, and develop models to better understand human disease and gene function. A fundamental roadblock for analysis is, however, the lack of a computational method for describing and comparing phenotypes of mutant animals and of human diseases when the genetic basis is unknown. We describe here a novel method using ontologies to record and quantify the similarity between phenotypes. We tested our method by using the annotated mutant phenotype of one member of the Hedgehog signaling pathway in zebrafish to identify other pathway members with similar recorded phenotypes. We also compared human disease phenotypes to those produced by mutation in model organisms, and show that orthologous and biologically relevant genes can be identified by this method. Given that the genetic basis of human disease is often unknown, this method provides a means for identifying candidate genes, pathway members, and disease models by computationally identifying similar phenotypes within and across species.
doi:10.1371/journal.pbio.1000247
PMCID: PMC2774506  PMID: 19956802
3.  Mapping between the OBO and OWL ontology languages 
Journal of Biomedical Semantics  2011;2(Suppl 1):S3.
Background
Ontologies are commonly used in biomedicine to organize concepts to describe domains such as anatomies, environments, experiment, taxonomies etc. NCBO BioPortal currently hosts about 180 different biomedical ontologies. These ontologies have been mainly expressed in either the Open Biomedical Ontology (OBO) format or the Web Ontology Language (OWL). OBO emerged from the Gene Ontology, and supports most of the biomedical ontology content. In comparison, OWL is a Semantic Web language, and is supported by the World Wide Web consortium together with integral query languages, rule languages and distributed infrastructure for information interchange. These features are highly desirable for the OBO content as well. A convenient method for leveraging these features for OBO ontologies is by transforming OBO ontologies to OWL.
Results
We have developed a methodology for translating OBO ontologies to OWL using the organization of the Semantic Web itself to guide the work. The approach reveals that the constructs of OBO can be grouped together to form a similar layer cake. Thus we were able to decompose the problem into two parts. Most OBO constructs have easy and obvious equivalence to a construct in OWL. A small subset of OBO constructs requires deeper consideration. We have defined transformations for all constructs in an effort to foster a standard common mapping between OBO and OWL. Our mapping produces OWL-DL, a Description Logics based subset of OWL with desirable computational properties for efficiency and correctness. Our Java implementation of the mapping is part of the official Gene Ontology project source.
Conclusions
Our transformation system provides a lossless roundtrip mapping for OBO ontologies, i.e. an OBO ontology may be translated to OWL and back without loss of knowledge. In addition, it provides a roadmap for bridging the gap between the two ontology languages in order to enable the use of ontology content in a language independent manner.
doi:10.1186/2041-1480-2-S1-S3
PMCID: PMC3105495  PMID: 21388572
4.  Predicting the Extension of Biomedical Ontologies 
PLoS Computational Biology  2012;8(9):e1002630.
Developing and extending a biomedical ontology is a very demanding task that can never be considered complete given our ever-evolving understanding of the life sciences. Extension in particular can benefit from the automation of some of its steps, thus releasing experts to focus on harder tasks. Here we present a strategy to support the automation of change capturing within ontology extension where the need for new concepts or relations is identified. Our strategy is based on predicting areas of an ontology that will undergo extension in a future version by applying supervised learning over features of previous ontology versions. We used the Gene Ontology as our test bed and obtained encouraging results with average f-measure reaching 0.79 for a subset of biological process terms. Our strategy was also able to outperform state of the art change capturing methods. In addition we have identified several issues concerning prediction of ontology evolution, and have delineated a general framework for ontology extension prediction. Our strategy can be applied to any biomedical ontology with versioning, to help focus either manual or semi-automated extension methods on areas of the ontology that need extension.
Author Summary
Biomedical knowledge is complex and in constant evolution and growth, making it difficult for researchers to keep up with novel discoveries. Ontologies have become essential to help with this issue since they provide a standardized format to describe knowledge that facilitates its storing, sharing and computational analysis. However, the effort to keep a biomedical ontology up-to-date is a demanding and costly task involving several experts. Much of this effort is dedicated to the addition of new elements to extend the ontology to cover new areas of knowledge. We have developed an automated methodology to identify areas of the ontology that need extension based on past versions of the ontology as well as external data such as references in scientific literature and ontology usage. This can be a valuable help to semi-automated ontology extension systems, since they can focus on the subdomains of the identified ontology areas thus reducing the amount of information to process, which in turn releases ontology developers to focus on more complex ontology evolution tasks. By contributing to a faster rate of ontology evolution, we hope to positively impact ontology-based applications such as natural language processing, computer reasoning, information integration or semantic querying of heterogenous data.
doi:10.1371/journal.pcbi.1002630
PMCID: PMC3441454  PMID: 23028267
5.  A UML profile for the OBO relation ontology 
BMC Genomics  2012;13(Suppl 5):S3.
Background
Ontologies have increasingly been used in the biomedical domain, which has prompted the emergence of different initiatives to facilitate their development and integration. The Open Biological and Biomedical Ontologies (OBO) Foundry consortium provides a repository of life-science ontologies, which are developed according to a set of shared principles. This consortium has developed an ontology called OBO Relation Ontology aiming at standardizing the different types of biological entity classes and associated relationships. Since ontologies are primarily intended to be used by humans, the use of graphical notations for ontology development facilitates the capture, comprehension and communication of knowledge between its users. However, OBO Foundry ontologies are captured and represented basically using text-based notations. The Unified Modeling Language (UML) provides a standard and widely-used graphical notation for modeling computer systems. UML provides a well-defined set of modeling elements, which can be extended using a built-in extension mechanism named Profile. Thus, this work aims at developing a UML profile for the OBO Relation Ontology to provide a domain-specific set of modeling elements that can be used to create standard UML-based ontologies in the biomedical domain.
Results
We have studied the OBO Relation Ontology, the UML metamodel and the UML profiling mechanism. Based on these studies, we have proposed an extension to the UML metamodel in conformance with the OBO Relation Ontology and we have defined a profile that implements the extended metamodel. Finally, we have applied the proposed UML profile in the development of a number of fragments from different ontologies. Particularly, we have considered the Gene Ontology (GO), the PRotein Ontology (PRO) and the Xenopus Anatomy and Development Ontology (XAO).
Conclusions
The use of an established and well-known graphical language in the development of biomedical ontologies provides a more intuitive form of capturing and representing knowledge than using only text-based notations. The use of the profile requires the domain expert to reason about the underlying semantics of the concepts and relationships being modeled, which helps preventing the introduction of inconsistencies in an ontology under development and facilitates the identification and correction of errors in an already defined ontology.
doi:10.1186/1471-2164-13-S5-S3
PMCID: PMC3477006  PMID: 23095840
6.  The lexical properties of the gene ontology. 
The Gene Ontology (GO) is a construct developed for the purpose of annotating molecular information about genes and their products. The ontology is a shared resource developed by the GO Consortium, a group of scientists who work on a variety of model organisms. In this paper we investigate the nature of the strings found in the Gene Ontology and evaluate them for their usefulness in natural language processing (NLP). We extend previous work that identified a set of properties that reliably identifies natural language phrases in the Unified Medical Language System (UMLS). The results indicate that a large percentage (79%) of GO terms are potentially useful for NLP applications. Some 35% of the GO terms were found in a corpus derived from the MEDLINE bibliographic database, and 27% of the terms were found in the current edition of the UMLS.
PMCID: PMC2244431  PMID: 12463875
7.  A Semantic Problem Solving Environment for Integrative Parasite Research: Identification of Intervention Targets for Trypanosoma cruzi 
Background
Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge.
Methodology/Principal Findings
We developed a semantic problem-solving environment (SPSE) that uses ontologies to integrate internal lab data with external resources in a Parasite Knowledge Base (PKB), which has the ability to query across these resources in a unified manner. The SPSE includes Web Ontology Language (OWL)-based ontologies, experimental data with its provenance information represented using the Resource Description Format (RDF), and a visual querying tool, Cuebee, that features integrated use of Web services. We demonstrate the use and benefit of SPSE using example queries for identifying gene knockout targets of Trypanosoma cruzi for vaccine development. Answers to these queries involve looking up multiple sources of data, linking them together and presenting the results.
Conclusion/Significance
The SPSE facilitates parasitologists in leveraging the growing, but disparate, parasite data resources by offering an integrative platform that utilizes Semantic Web techniques, while keeping their workload increase minimal.
Author Summary
Effective research in parasite biology requires analyzing experimental lab data in the context of constantly expanding public data resources. Integrating lab data with public resources is particularly difficult for biologists who may not possess significant computational skills to acquire and process heterogeneous data stored at different locations. Therefore, we develop a semantic problem solving environment (SPSE) that allows parasitologists to query their lab data integrated with public resources using ontologies. An ontology specifies a common vocabulary and formal relationships among the terms that describe an organism, and experimental data and processes in this case. SPSE supports capturing and querying provenance information, which is metadata on the experimental processes and data recorded for reproducibility, and includes a visual query-processing tool to formulate complex queries without learning the query language syntax. We demonstrate the significance of SPSE in identifying gene knockout targets for T. cruzi. The overall goal of SPSE is to help researchers discover new or existing knowledge that is implicitly present in the data but not always easily detected. Results demonstrate improved usefulness of SPSE over existing lab systems and approaches, and support for complex query design that is otherwise difficult to achieve without the knowledge of query language syntax.
doi:10.1371/journal.pntd.0001458
PMCID: PMC3260319  PMID: 22272365
8.  Understanding and using the meaning of statements in a bio-ontology: recasting the Gene Ontology in OWL 
BMC Bioinformatics  2007;8:57.
The bio-ontology community falls into two camps: first we have biology domain experts, who actually hold the knowledge we wish to capture in ontologies; second, we have ontology specialists, who hold knowledge about techniques and best practice on ontology development. In the bio-ontology domain, these two camps have often come into conflict, especially where pragmatism comes into conflict with perceived best practice. One of these areas is the insistence of computer scientists on a well-defined semantic basis for the Knowledge Representation language being used. In this article, we will first describe why this community is so insistent. Second, we will illustrate this by examining the semantics of the Web Ontology Language and the semantics placed on the Directed Acyclic Graph as used by the Gene Ontology. Finally we will reconcile the two representations, including the broader Open Biomedical Ontologies format. The ability to exchange between the two representations means that we can capitalise on the features of both languages. Such utility can only arise by the understanding of the semantics of the languages being used. By this illustration of the usefulness of a clear, well-defined language semantics, we wish to promote a wider understanding of the computer science perspective amongst potential users within the biological community.
doi:10.1186/1471-2105-8-57
PMCID: PMC1819394  PMID: 17311682
9.  A natural language interface plug-in for cooperative query answering in biological databases 
BMC Genomics  2012;13(Suppl 3):S4.
Background
One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations.
Results
Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema.
Conclusions
The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.
doi:10.1186/1471-2164-13-S3-S4
PMCID: PMC3323828  PMID: 22759613
10.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology 
Nucleic Acids Research  2004;32(Database issue):D262-D266.
The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting UniProt annotation into a recognized computational format. GOA provides annotated entries for nearly 60 000 species (GOA-SPTr) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups, GOA consolidates specialized knowledge and expertise to ensure the data remain a key reference for up-to-date biological information. Furthermore, the GOA database fully endorses the Human Proteomics Initiative by prioritizing the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTr), a series of GO mapping files and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface or downloaded in a parsable format via the EBI and GO FTP websites. The GOA data set can be used to enhance the annotation of particular model organism or gene expression data sets, although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt and work towards enhancing the ability of scientists to access all available biological information. Researchers wishing to query or contribute to the GOA project are encouraged to email: goa@ebi.ac.uk.
doi:10.1093/nar/gkh021
PMCID: PMC308756  PMID: 14681408
11.  The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments 
Background
The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience.
Description
Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases.
Conclusions
In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community.
doi:10.1186/2041-1480-4-20
PMCID: PMC3852282  PMID: 24093723
Gene ontology; Cellular component ontology; Subcellular anatomy ontology; Neuroscience; Annotation; Ontology language; Ontology integration; Neuroscience information framework
12.  The Role of the Toxicologic Pathologist in the Post-Genomic Era# 
Journal of Toxicologic Pathology  2013;26(2):105-110.
An era can be defined as a period in time identified by distinctive character, events, or practices. We are now in the genomic era. The pre-genomic era: There was a pre-genomic era. It started many years ago with novel and seminal animal experiments, primarily directed at studying cancer. It is marked by the development of the two-year rodent cancer bioassay and the ultimate realization that alternative approaches and short-term animal models were needed to replace this resource-intensive and time-consuming method for predicting human health risk. Many alternatives approaches and short-term animal models were proposed and tried but, to date, none have completely replaced our dependence upon the two-year rodent bioassay. However, the alternative approaches and models themselves have made tangible contributions to basic research, clinical medicine and to our understanding of cancer and they remain useful tools to address hypothesis-driven research questions. The pre-genomic era was a time when toxicologic pathologists played a major role in drug development, evaluating the cancer bioassay and the associated dose-setting toxicity studies, and exploring the utility of proposed alternative animal models. It was a time when there was shortage of qualified toxicologic pathologists. The genomic era: We are in the genomic era. It is a time when the genetic underpinnings of normal biological and pathologic processes are being discovered and documented. It is a time for sequencing entire genomes and deliberately silencing relevant segments of the mouse genome to see what each segment controls and if that silencing leads to increased susceptibility to disease. What remains to be charted in this genomic era is the complex interaction of genes, gene segments, post-translational modifications of encoded proteins, and environmental factors that affect genomic expression. In this current genomic era, the toxicologic pathologist has had to make room for a growing population of molecular biologists. In this present era newly emerging DVM and MD scientists enter the work arena with a PhD in pathology often based on some aspect of molecular biology or molecular pathology research. In molecular biology, the almost daily technological advances require one’s complete dedication to remain at the cutting edge of the science. Similarly, the practice of toxicologic pathology, like other morphological disciplines, is based largely on experience and requires dedicated daily examination of pathology material to maintain a well-trained eye capable of distilling specific information from stained tissue slides - a dedicated effort that cannot be well done as an intermezzo between other tasks. It is a rare individual that has true expertise in both molecular biology and pathology. In this genomic era, the newly emerging DVM-PhD or MD-PhD pathologist enters a marketplace without many job opportunities in contrast to the pre-genomic era. Many face an identity crisis needing to decide to become a competent pathologist or, alternatively, to become a competent molecular biologist. At the same time, more PhD molecular biologists without training in pathology are members of the research teams working in drug development and toxicology. How best can the toxicologic pathologist interact in the contemporary team approach in drug development, toxicology research and safety testing? Based on their biomedical training, toxicologic pathologists are in an ideal position to link data from the emerging technologies with their knowledge of pathobiology and toxicology. To enable this linkage and obtain the synergy it provides, the bench-level, slide-reading expert pathologist will need to have some basic understanding and appreciation of molecular biology methods and tools. On the other hand, it is not likely that the typical molecular biologist could competently evaluate and diagnose stained tissue slides from a toxicology study or a cancer bioassay. The post-genomic era: The post-genomic era will likely arrive approximately around 2050 at which time entire genomes from multiple species will exist in massive databases, data from thousands of robotic high throughput chemical screenings will exist in other databases, genetic toxicity and chemical structure-activity-relationships will reside in yet other databases. All databases will be linked and relevant information will be extracted and analyzed by appropriate algorithms following input of the latest molecular, submolecular, genetic, experimental, pathology and clinical data. Knowledge gained will permit the genetic components of many diseases to be amenable to therapeutic prevention and/or intervention. Much like computerized algorithms are currently used to forecast weather or to predict political elections, computerized sophisticated algorithms based largely on scientific data mining will categorize new drugs and chemicals relative to their health benefits versus their health risks for defined human populations and subpopulations. However, this form of a virtual toxicity study or cancer bioassay will only identify probabilities of adverse consequences from interaction of particular environmental and/or chemical/drug exposure(s) with specific genomic variables. Proof in many situations will require confirmation in intact in vivo mammalian animal models. The toxicologic pathologist in the post-genomic era will be the best suited scientist to confirm the data mining and its probability predictions for safety or adverse consequences with the actual tissue morphological features in test species that define specific test agent pathobiology and human health risk.
doi:10.1293/tox.26.105
PMCID: PMC3695332  PMID: 23914052
genomic era; history of toxicologic pathology; molecular biology
13.  Multiple Origins and Regional Dispersal of Resistant dhps in African Plasmodium falciparum Malaria 
PLoS Medicine  2009;6(4):e1000055.
Cally Roper and colleagues analyze the distribution of sulfadoxine resistance mutations and flanking microsatellite loci to trace the emergence and dispersal of drug-resistant Plasmodium falciparum malaria in Africa.
Background
Although the molecular basis of resistance to a number of common antimalarial drugs is well known, a geographic description of the emergence and dispersal of resistance mutations across Africa has not been attempted. To that end we have characterised the evolutionary origins of antifolate resistance mutations in the dihydropteroate synthase (dhps) gene and mapped their contemporary distribution.
Methods and Findings
We used microsatellite polymorphism flanking the dhps gene to determine which resistance alleles shared common ancestry and found five major lineages each of which had a unique geographical distribution. The extent to which allelic lineages were shared among 20 African Plasmodium falciparum populations revealed five major geographical groupings. Resistance lineages were common to all sites within these regions. The most marked differentiation was between east and west African P. falciparum, in which resistance alleles were not only of different ancestry but also carried different resistance mutations.
Conclusions
Resistant dhps has emerged independently in multiple sites in Africa during the past 10–20 years. Our data show the molecular basis of resistance differs between east and west Africa, which is likely to translate into differing antifolate sensitivity. We have also demonstrated that the dispersal patterns of resistance lineages give unique insights into recent parasite migration patterns.
Editors' Summary
Background
Plasmodium falciparum, a mosquito-borne parasite that causes malaria, kills nearly one million people every year, mostly in sub-Saharan Africa. People become infected with P. falciparum when they are bitten by a mosquito that has acquired the parasite in a blood meal taken from an infected person. P. falciparum malaria, which is characterized by recurring fevers and chills, anemia (loss of red blood cells), and damage to vital organs, can be fatal within hours of symptom onset if untreated. Until recently, treatment in Africa relied on chloroquine and sulfadoxine–pyrimethamine. Unfortunately, parasites resistant to both these antimalarial drugs is now widespread. Consequently, the World Health Organization currently recommends artemisinin combination therapy for the treatment of P. falciparum malaria in Africa and other places where drug-resistant malaria is common. In this therapy, artemisinin derivatives (new fast-acting antimalarial agents) are used in combination with another antimalarial to reduce the chances of P. falciparum becoming resistant to either drug.
Why Was This Study Done?
P. falciparum becomes resistant to antimalarial drugs by acquiring “resistance mutations,” genetic changes that prevent these drugs from killing the parasite. A mutation in the gene encoding a protein called the chloroquine resistance transporter causes resistance to chloroquine, a specific group of mutations in the dihydrofolate reductase gene causes resistance to pyrimethamine, and several mutations in dhps, the gene that encodes dihydropteroate synthase, are associated with resistance to sulfadoxine. Scientists have discovered that the mutations causing chloroquine and pyrimethamine resistance originated in Asia and spread into Africa (probably multiple times) in the late 1970s and mid-1980s, respectively. These Asian-derived mutations are now common throughout Africa and, consequently, it is not possible to determine how they spread across the continent. Information of this sort would, however, help experts design effective measures to control the spread of drug-resistant P. falciparum. Because the mutations in dhps that cause sulfadoxine resistance only began to emerge in the mid-1990s, they haven't spread evenly across Africa yet. In this study, therefore, the researchers use genetic methods to characterize the geographical origins and contemporary distribution of dhps resistance mutations in Africa.
What Did the Researchers Do and Find?
The researchers analyzed dhps mutations in P. falciparum DNA from blood samples collected from patients with malaria in various African countries and searched the scientific literature for other similar studies. Together, these data show that five major variant dhps sequences (three of which contain mutations that confer various degrees of resistance to sulphadoxine in laboratory tests) are currently present in Africa, each with a unique geographical distribution. In particular, the data show that P. falciparum parasites in east and west Africa carry different resistance mutations. Next, the researchers looked for microsatellite variants in the DNA flanking the dhps gene. Microsatellites are DNA regions that contain short, repeated sequences of nucleotides. Because the number of repeats can vary and because microsatellites are inherited together with nearby genes, the ancestry of various resistance mutations can be worked out by examining the microsatellites flanking different mutant dhps genes. This analysis revealed five regional clusters in which the same resistance lineage was present at all the sites examined within the region and also showed that the resistance mutations in east and west Africa have a different ancestry.
What Do These Findings Mean?
These findings show that sulfadoxine-resistant P. falciparum has recently emerged independently at multiple sites in Africa and that the molecular basis for sulfadoxine resistance is different in east and west Africa. This latter result may have clinical implications because it suggests that the effectiveness of sulfadoxine as an antimalarial drug may vary across the continent. Finally, although many more samples need to be analyzed to build a complete picture of the spread of antimalarial resistance across Africa, these findings suggest that economic and transport infrastructures may have played a role in governing recent parasite dispersal across this continent by affecting human migration. Thus, coordinated malaria control campaigns across socioeconomically linked areas in Africa may reduce the African malaria burden more effectively than campaigns that are confined to national territories.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000055.
This study is further discussed in a PLoS Medicine Perspective by Tim Anderson
The MedlinePlus encyclopedia contains a page on malaria (in English and Spanish)
Information is available from the World Health Organization on malaria (in several languages) and on drug-resistant malaria
The US Centers for Disease Control and Prevention provide information on malaria (in English and Spanish)
Information is available from the Roll Back Malaria Partnership on its approach to the global control of malaria, and on malaria control efforts in specific parts of the world
The WorldWide Antimalarial Resistance Network is creating an international database about antimalarial drug resistance
doi:10.1371/journal.pmed.1000055
PMCID: PMC2661256  PMID: 19365539
14.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration 
Nature biotechnology  2007;25(11):1251.
The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or ‘ontologies’. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.
doi:10.1038/nbt1346
PMCID: PMC2814061  PMID: 17989687
15.  Multifunctional crop trait ontology for breeders' data: field book, annotation, data discovery and semantic enrichment of the literature 
AoB Plants  2010;2010:plq008.
The ‘Crop Ontology’ database we describe provides a controlled vocabulary for several economically important crops. It facilitates data integration and discovery from global databases and digital literature. This allows researchers to exploit comparative phenotypic and genotypic information of crops to elucidate functional aspects of traits.
Background and aims
Agricultural crop databases maintained in gene banks of the Consultative Group on International Agricultural Research (CGIAR) are valuable sources of information for breeders. These databases provide comparative phenotypic and genotypic information that can help elucidate functional aspects of plant and agricultural biology. To facilitate data sharing within and between these databases and the retrieval of information, the crop ontology (CO) database was designed to provide controlled vocabulary sets for several economically important plant species.
Methodology
Existing public ontologies and equivalent catalogues of concepts covering the range of crop science information and descriptors for crops and crop-related traits were collected from breeders, physiologists, agronomists, and researchers in the CGIAR consortium. For each crop, relationships between terms were identified and crop-specific trait ontologies were constructed following the Open Biomedical Ontologies (OBO) format standard using the OBO-Edit tool. All terms within an ontology were assigned a globally unique CO term identifier.
Principal results
The CO currently comprises crop-specific traits for chickpea (Cicer arietinum), maize (Zea mays), potato (Solanum tuberosum), rice (Oryza sativa), sorghum (Sorghum spp.) and wheat (Triticum spp.). Several plant-structure and anatomy-related terms for banana (Musa spp.), wheat and maize are also included. In addition, multi-crop passport terms are included as controlled vocabularies for sharing information on germplasm. Two web-based online resources were built to make these COs available to the scientific community: the ‘CO Lookup Service’ for browsing the CO; and the ‘Crops Terminizer’, an ontology text mark-up tool.
Conclusions
The controlled vocabularies of the CO are being used to curate several CGIAR centres' agronomic databases. The use of ontology terms to describe agronomic phenotypes and the accurate mapping of these descriptions into databases will be important steps in comparative phenotypic and genotypic studies across species and gene-discovery experiments.
doi:10.1093/aobpla/plq008
PMCID: PMC3000699  PMID: 22476066
16.  Tobacco Company Efforts to Influence the Food and Drug Administration-Commissioned Institute of Medicine Report Clearing the Smoke: An Analysis of Documents Released through Litigation 
PLoS Medicine  2013;10(5):e1001450.
Stanton Glantz and colleagues investigate efforts by tobacco companies to influence Clearing the Smoke, a 2001 Institute of Medicine report on harm reduction tobacco products.
Please see later in the article for the Editors' Summary
Background
Spurred by the creation of potential modified risk tobacco products, the US Food and Drug Administration (FDA) commissioned the Institute of Medicine (IOM) to assess the science base for tobacco “harm reduction,” leading to the 2001 IOM report Clearing the Smoke. The objective of this study was to determine how the tobacco industry organized to try to influence the IOM committee that prepared the report.
Methods and Findings
We analyzed previously secret tobacco industry documents in the University of California, San Francisco Legacy Tobacco Documents Library, and IOM public access files. (A limitation of this method includes the fact that the tobacco companies have withheld some possibly relevant documents.) Tobacco companies considered the IOM report to have high-stakes regulatory implications. They developed and implemented strategies with consulting and legal firms to access the IOM proceedings. When the IOM study staff invited the companies to provide information on exposure and disease markers, clinical trial design for safety and efficacy, and implications for initiation and cessation, tobacco company lawyers, consultants, and in-house regulatory staff shaped presentations from company scientists. Although the available evidence does not permit drawing cause-and-effect conclusions, and the IOM may have come to the same conclusions without the influence of the tobacco industry, the companies were pleased with the final report, particularly the recommendations for a tiered claims system (with separate tiers for exposure and risk, which they believed would ease the process of qualifying for a claim) and license to sell products comparable to existing conventional cigarettes (“substantial equivalence”) without prior regulatory approval. Some principles from the IOM report, including elements of the substantial equivalence recommendation, appear in the 2009 Family Smoking Prevention and Tobacco Control Act.
Conclusions
Tobacco companies strategically interacted with the IOM to win several favored scientific and regulatory recommendations.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Up to half of tobacco users will die of cancer, lung disease, heart disease, stroke, or another tobacco-related disease. Cigarettes and other tobacco products cause disease because they expose their users to nicotine and numerous other toxic chemicals. Tobacco companies have been working to develop a “safe” cigarette for more than half a century. Initially, their attention focused on cigarettes that produced lower tar and nicotine yields in machine-smoking tests. These products were perceived as “safer” products by the public and scientists for many years, but it is now known that the use of low-yield cigarettes can actually expose smokers to higher levels of toxins than standard cigarettes. More recently, the tobacco companies have developed other products (for example, products that heat aerosols of nicotine, rather than burning the tobacco) that claim to reduce harm and the risk of tobacco-related disease, but they can only market these modified risk tobacco products in the US after obtaining Food and Drug Administration (FDA) approval. In 1999, the FDA commissioned the US Institute of Medicine (IOM, an influential source of independent expert advice on medical issues) to assess the science base for tobacco “harm reduction.” In 2001, the IOM published its report Clearing the Smoke: Assessing the Science Base for Tobacco Harm and Reduction, which, although controversial, set the tone for the development and regulation of tobacco products in the US, particularly those claiming to be less dangerous, in subsequent years.
Why Was This Study Done?
Tobacco companies have a long history of working to shape scientific discussions and agendas. For example, they have produced research results designed to “create controversy” about the dangers of smoking and secondhand smoke. In this study, the researchers investigate how tobacco companies organized to try to influence the IOM committee that prepared the Clearing the Smoke report on modified risk tobacco products by analyzing tobacco industry and IOM documents.
What Did the Researchers Do and Find?
The researchers searched the Legacy Tobacco Documents Library (a collection of internal tobacco industry documents released as a result of US litigation cases) for documents outlining how tobacco companies tried to influence the IOM Committee to Assess the Science Base for Tobacco Harm Reduction and created a timeline of events from the 1,000 or so documents they retrieved. They confirmed and supplemented this timeline using information in 80 files that detailed written interactions between the tobacco companies and the IOM committee, which they obtained through a public records access request. Analysis of these documents indicates that the tobacco companies considered the IOM report to have important regulatory implications, that they developed and implemented strategies with consulting and legal firms to access the IOM proceedings, and that tobacco company lawyers, consultants, and regulatory staff shaped presentations to the IOM committee by company scientists on various aspects of tobacco harm reduction products. The analysis also shows that tobacco companies were pleased with the final report, particularly its recommendation that tobacco products can be marketed with exposure or risk reduction claims provided the products substantially reduce exposure and provided the behavioral and health consequences of these products are determined in post-marketing surveillance and epidemiological studies (“tiered testing”) and its recommendation that, provided no claim of reduced exposure or risk is made, new products comparable to existing conventional cigarettes (“substantial equivalence”) can be marketed without prior regulatory approval.
What Do These Findings Mean?
These findings suggest that tobacco companies used their legal and regulatory staff to access the IOM committee that advised the FDA on modified risk tobacco products and that they used this access to deliver specific, carefully formulated messages designed to serve their business interests. Although these findings provide no evidence that the efforts of tobacco companies influenced the IOM committee in any way, they show that the companies were satisfied with the final IOM report and its recommendations, some of which have policy implications that continue to reverberate today. The researchers therefore call for the FDA and other regulatory bodies to remember that they are dealing with companies with a long history of intentionally misleading the public when assessing the information presented by tobacco companies as part of the regulatory process and to actively protect their public-health policies from the commercial interests of the tobacco industry.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001450.
This study is further discussed in a PLOS Medicine Perspective by Thomas Novotny
The World Health Organization provides information about the dangers of tobacco (in several languages); for information about the tobacco industry's influence on policy, see the 2009 World Health Organization report Tobacco interference with tobacco control
A PLOS Medicine Research Article by Heide Weishaar and colleagues describes tobacco company efforts to undermine the Framework Convention on Tobacco Control, an international instrument for tobacco control
Wikipedia has a page on tobacco harm reduction (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
The IOM report Clearing the Smoke: Assessing the Science Base for Tobacco Harm Reduction is available to read online
The Legacy Tobacco Documents Library is a public, searchable database of tobacco company internal documents detailing their advertising, manufacturing, marketing, sales, and scientific activities
The University of California, San Francisco Center for Tobacco Control Research and Education is the focal point for University of California, San Francisco (UCSF) scientists in disciplines ranging from the molecular biology of nicotine addiction through political science who combine their efforts to eradicate the use of tobacco and tobacco-induced cancer and other diseases worldwide
SmokeFree, a website provided by the UK National Health Service, offers advice on quitting smoking and includes personal stories from people who have stopped smoking
Smokefree.gov, from the US National Cancer Institute, offers online tools and resources to help people quit smoking
doi:10.1371/journal.pmed.1001450
PMCID: PMC3665841  PMID: 23723740
17.  The Chilling Effect: How Do Researchers React to Controversy? 
PLoS Medicine  2008;5(11):e222.
Background
Can political controversy have a “chilling effect” on the production of new science? This is a timely concern, given how often American politicians are accused of undermining science for political purposes. Yet little is known about how scientists react to these kinds of controversies.
Methods and Findings
Drawing on interview (n = 30) and survey data (n = 82), this study examines the reactions of scientists whose National Institutes of Health (NIH)-funded grants were implicated in a highly publicized political controversy. Critics charged that these grants were “a waste of taxpayer money.” The NIH defended each grant and no funding was rescinded. Nevertheless, this study finds that many of the scientists whose grants were criticized now engage in self-censorship. About half of the sample said that they now remove potentially controversial words from their grant and a quarter reported eliminating entire topics from their research agendas. Four researchers reportedly chose to move into more secure positions entirely, either outside academia or in jobs that guaranteed salaries. About 10% of the group reported that this controversy strengthened their commitment to complete their research and disseminate it widely.
Conclusions
These findings provide evidence that political controversies can shape what scientists choose to study. Debates about the politics of science usually focus on the direct suppression, distortion, and manipulation of scientific results. This study suggests that scholars must also examine how scientists may self-censor in response to political events.
Drawing on interview and survey data, Joanna Kempner's study finds that political controversies shape what many scientists choose not to study.
Editors' Summary
Background.
Scientific research is an expensive business and, inevitably, the organizations that fund this research—governments, charities, and industry—play an important role in determining the directions that this research takes. Funding bodies can have both positive and negative effects on the acquisition of scientific knowledge. They can pump money into topical areas such as the human genome project. Alternatively, by withholding funding, they can discourage some types of research. So, for example, US federal funds cannot be used to support many aspects of human stem cell research. “Self-censoring” by scientists can also have a negative effect on scientific progress. That is, some scientists may decide to avoid areas of research in which there are many regulatory requirements, political pressure, or in which there is substantial pressure from advocacy groups. A good example of this last type of self-censoring is the withdrawal of many scientists from research that involves certain animal models, like primates, because of animal rights activists.
Why Was This Study Done?
Some people think that political controversy might also encourage scientists to avoid some areas of scientific inquiry, but no studies have formally investigated this possibility. Could political arguments about the value of certain types of research influence the questions that scientists pursue? An argument of this sort occurred in the US in 2003 when Patrick Toomey, who was then a Republican Congressional Representative, argued that National Institutes of Health (NIH) grants supporting research into certain aspects of sexual behavior were “much less worthy of taxpayer funding” than research on “devastating diseases,” and proposed an amendment to the 2004 NIH appropriations bill (which regulates the research funded by NIH). The Amendment was rejected, but more than 200 NIH-funded grants, most of which examined behaviors that affect the spread of HIV/AIDS, were internally reviewed later that year; NIH defended each grant, so none were curtailed. In this study, Joanna Kempner investigates how the scientists whose US federal grants were targeted in this clash between politics and science responded to the political controversy.
What Did the Researchers Do and Find?
Kempner interviewed 30 of the 162 principal investigators (PIs) whose grants were reviewed. She asked them to describe their research, the grants that were reviewed, and their experience with NIH before, during, and after the controversy. She also asked them whether this experience had changed their research practice. She then used the information from these interviews to design a survey that she sent to all the PIs whose grants had been reviewed; 82 responded. About half of the scientists interviewed and/or surveyed reported that they now remove “red flag” words (for example, “AIDS” and “homosexual”) from the titles and abstracts of their grant applications. About one-fourth of the respondents no longer included controversial topics (for example, “abortion” and “emergency contraception”) in their research agendas, and four researchers had made major career changes as a result of the controversy. Finally, about 10% of respondents said that their experience had strengthened their commitment to see their research completed and its results published although even many of these scientists also engaged in some self-censorship.
What Do These Findings Mean?
These findings show that, even though no funding was withdrawn, self-censoring is now common among the scientists whose grants were targeted during this particular political controversy. Because this study included researchers in only one area of health research, its findings may not be generalizable to other areas of research. Furthermore, because only half of the PIs involved in the controversy responded to the survey, these findings may be affected by selection bias. That is, the scientists most anxious about the effects of political controversy on their research funding (and thus more likely to engage in self-censorship) may not have responded. Nevertheless, these findings suggest that the political environment might have a powerful effect on self-censorship by scientists and might dissuade some scientists from embarking on research projects that they would otherwise have pursued. Further research into what Kempner calls the “chilling effect” of political controversy on scientific research is now needed to ensure that a healthy balance can be struck between political involvement in scientific decision making and scientific progress.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0050222.
The Consortium of Social Science Associations, an advocacy organization that provides a bridge between the academic research community and Washington policymakers, has more information about the political controversy initiated by Patrick Toomey
Some of Kempner's previous research on self-censorship by scientists is described in a 2005 National Geographic news article
doi:10.1371/journal.pmed.0050222
PMCID: PMC2586361  PMID: 19018657
18.  A common layer of interoperability for biomedical ontologies based on OWL EL 
Bioinformatics  2011;27(7):1001-1008.
Motivation: Ontologies are essential in biomedical research due to their ability to semantically integrate content from different scientific databases and resources. Their application improves capabilities for querying and mining biological knowledge. An increasing number of ontologies is being developed for this purpose, and considerable effort is invested into formally defining them in order to represent their semantics explicitly. However, current biomedical ontologies do not facilitate data integration and interoperability yet, since reasoning over these ontologies is very complex and cannot be performed efficiently or is even impossible. We propose the use of less expressive subsets of ontology representation languages to enable efficient reasoning and achieve the goal of genuine interoperability between ontologies.
Results: We present and evaluate EL Vira, a framework that transforms OWL ontologies into the OWL EL subset, thereby enabling the use of tractable reasoning. We illustrate which OWL constructs and inferences are kept and lost following the conversion and demonstrate the performance gain of reasoning indicated by the significant reduction of processing time. We applied EL Vira to the open biomedical ontologies and provide a repository of ontologies resulting from this conversion. EL Vira creates a common layer of ontological interoperability that, for the first time, enables the creation of software solutions that can employ biomedical ontologies to perform inferences and answer complex queries to support scientific analyses.
Availability and implementation: The EL Vira software is available from http://el-vira.googlecode.com and converted OBO ontologies and their mappings are available from http://bioonto.gen.cam.ac.uk/el-ont.
Contact: rh497@cam.ac.uk
doi:10.1093/bioinformatics/btr058
PMCID: PMC3065691  PMID: 21343142
19.  Applications and methods utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for bioinformatics resource discovery and disparate data and service integration 
BioData Mining  2010;3:3.
Background
Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of data between information resources difficult and labor intensive. A recently described semantic web protocol, the Simple Semantic Web Architecture and Protocol (SSWAP; pronounced "swap") offers the ability to describe data and services in a semantically meaningful way. We report how three major information resources (Gramene, SoyBase and the Legume Information System [LIS]) used SSWAP to semantically describe selected data and web services.
Methods
We selected high-priority Quantitative Trait Locus (QTL), genomic mapping, trait, phenotypic, and sequence data and associated services such as BLAST for publication, data retrieval, and service invocation via semantic web services. Data and services were mapped to concepts and categories as implemented in legacy and de novo community ontologies. We used SSWAP to express these offerings in OWL Web Ontology Language (OWL), Resource Description Framework (RDF) and eXtensible Markup Language (XML) documents, which are appropriate for their semantic discovery and retrieval. We implemented SSWAP services to respond to web queries and return data. These services are registered with the SSWAP Discovery Server and are available for semantic discovery at http://sswap.info.
Results
A total of ten services delivering QTL information from Gramene were created. From SoyBase, we created six services delivering information about soybean QTLs, and seven services delivering genetic locus information. For LIS we constructed three services, two of which allow the retrieval of DNA and RNA FASTA sequences with the third service providing nucleic acid sequence comparison capability (BLAST).
Conclusions
The need for semantic integration technologies has preceded available solutions. We report the feasibility of mapping high priority data from local, independent, idiosyncratic data schemas to common shared concepts as implemented in web-accessible ontologies. These mappings are then amenable for use in semantic web services. Our implementation of approximately two dozen services means that biological data at three large information resources (Gramene, SoyBase, and LIS) is available for programmatic access, semantic searching, and enhanced interaction between the separate missions of these resources.
doi:10.1186/1756-0381-3-3
PMCID: PMC2894815  PMID: 20525377
20.  Global Health Governance and the Commercial Sector: A Documentary Analysis of Tobacco Company Strategies to Influence the WHO Framework Convention on Tobacco Control 
PLoS Medicine  2012;9(6):e1001249.
Heide Weishaar and colleagues did an analysis of internal tobacco industry documents together with other data and describe the industry's strategic response to the proposed World Health Organization Framework Convention on Tobacco Control.
Background
In successfully negotiating the Framework Convention on Tobacco Control (FCTC), the World Health Organization (WHO) has led a significant innovation in global health governance, helping to transform international tobacco control. This article provides the first comprehensive review of the diverse campaign initiated by transnational tobacco corporations (TTCs) to try to undermine the proposed convention.
Methods and Findings
The article is primarily based on an analysis of internal tobacco industry documents made public through litigation, triangulated with data from official documentation relating to the FCTC process and websites of relevant organisations. It is also informed by a comprehensive review of previous studies concerning tobacco industry efforts to influence the FCTC. The findings demonstrate that the industry's strategic response to the proposed WHO convention was two-fold. First, arguments and frames were developed to challenge the FCTC, including: claiming there would be damaging economic consequences; depicting tobacco control as an agenda promoted by high-income countries; alleging the treaty conflicted with trade agreements, “good governance,” and national sovereignty; questioning WHO's mandate; claiming the FCTC would set a precedent for issues beyond tobacco; and presenting corporate social responsibility (CSR) as an alternative. Second, multiple tactics were employed to promote and increase the impact of these arguments, including: directly targeting FCTC delegations and relevant political actors, enlisting diverse allies (e.g., mass media outlets and scientists), and using stakeholder consultation to delay decisions and secure industry participation.
Conclusions
TTCs' efforts to undermine the FCTC were comprehensive, demonstrating the global application of tactics that TTCs have previously been found to have employed nationally and further included arguments against the FCTC as a key initiative in global health governance. Awareness of these strategies can help guard against industry efforts to disrupt the implementation of the FCTC and support the development of future, comparable initiatives in global health.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Every year, about 5 million people die worldwide from tobacco-related causes and, if current trends continue, annual deaths from tobacco-related causes will increase to 10 million by 2030. In response to this global tobacco epidemic, the World Health Organization (WHO) has developed an international instrument for tobacco control called the Framework Convention on Tobacco Control (FCTC). Negotiations on the FCTC began in 1999, and the international treaty—the first to be negotiated under the auspices of WHO—entered into force on 27 February 2005. To date, 174 countries have become parties to the FCTC. As such, they agree to implement comprehensive bans on tobacco advertising, promotion, and sponsorship; to ban misleading and deceptive terms on cigarette packaging; to implement health warnings on tobacco packaging; to protect people from tobacco smoke exposure in public spaces and indoor workplaces; to implement taxation policies aimed at reducing tobacco consumption; and to combat illicit trade in tobacco products.
Why Was This Study Done?
Transnational tobacco corporations (TTCs) are sometimes described as “vectors” of the global tobacco epidemic because of their drive to maximize shareholder value and tobacco consumption. Just like conventional disease vectors (agents that carry or transmit infectious organisms), TTCs employ a variety of tactics to ensure the spread of tobacco consumption. For example, various studies have shown that TTCs have developed strategies that attempt to limit the impact of tobacco control measures such as the FCTC. However, to date, studies investigating the influence of TTCs on the FCTC have concentrated on specific countries or documented specific tactics. Here, the researchers undertake a comprehensive review of the diverse tactics employed by TTCs to undermine the development of the FCTC. Such a review is important because its results should facilitate the effective implementation of FCTC measures and could support the development of future tobacco control initiatives and of global initiatives designed to control alcohol-related and food-related disease and death.
What Did the Researchers Do and Find?
The researchers analyzed documents retrieved from the Legacy Tobacco Documents Library (a collection of internal tobacco industry documents released as a result of US litigation cases) dealing with the strategies employed by TTCs to influence the FCTC alongside data from the websites of industry, consultancy, and other organizations cited in the documents; the official records of the FCTC process; and previous studies of tobacco industry efforts to influence the FCTC. Their analysis reveals that the strategic response of the major TTCs to the proposed FCTC was two-fold. First, the TTCs developed a series of arguments and “frames” (beliefs and ideas that provide a framework for thinking about an issue) to challenge the FCTC. Core frames included claiming that the FCTC would have damaging economic consequences, questioning WHO's mandate to develop a legally binding international treaty by claiming that tobacco was not a cross-border problem, and presenting corporate social responsibility (the commitment by business to affect the environment, consumers, employees, and society positively in addition to making money for shareholders) as an alternative to the FCTC. Second, the TTCs employed multiple strategies to promote and increase the impact of these arguments and frames, such as targeting FCTC delegations and enlisting the help of diverse allies including media outlets and scientists.
What Do These Findings Mean?
These findings illustrate the variety and complexity of the tobacco industry's efforts to undermine the FCTC and show the extent to which TTCs combined and coordinated tactics on a global stage that they had previously used on a national stage. Indeed, “the comprehensiveness and scale of the tobacco industry's response to the FCTC suggests that it is reasonable to speak of a ‘globalisation of tobacco industry strategy’ in combating the development of effective tobacco control policies,” write the researchers. Awareness of the strategies employed by TTCs to influence the FCTC should help guard against industry efforts to disrupt the implementation of the FCTC and should support the development of future global tobacco control initiatives. More generally, these findings should support the development of global health initiatives designed to tackle cardiovascular disease, cancer, chronic respiratory diseases and diabetes – non-communicable diseases that together account for 60% of global deaths and are partly driven by the commercial activities of food, alcohol, and tobacco corporations.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001249.
The World Health Organization provides information about the dangers of tobacco (in several languages) and about the Framework Convention on Tobacco Control
For information about the tobacco industry's influence on policy, see the 2009 World Health Organization report Tobacco interference with tobacco control
The Framework Convention Alliance provides more information about the FCTC
The Legacy Tobacco Documents Library is a public, searchable database of tobacco company internal documents detailing their advertising, manufacturing, marketing, sales, and scientific activities
The UK Centre for Tobacco Control Studies is a network of UK universities that undertakes original research, policy development, advocacy, and teaching and training in the field of tobacco control
SmokeFree, a website provided by the UK National Health Service, offers advice on quitting smoking and includes personal stories from people who have stopped smoking
Smokefree.gov, from the US National Cancer Institute, offers online tools and resources to help people quit smoking and not start again
doi:10.1371/journal.pmed.1001249
PMCID: PMC3383743  PMID: 22745607
21.  The Protein Feature Ontology: A Tool for the Unification of Protein Annotations 
Bioinformatics (Oxford, England)  2008;24(23):2767-2772.
The advent of sequencing and structural genomics projects has provided a dramatic boost in the number of protein structures and sequences. Due to the high-throughput nature of these projects, many of the molecules are uncharacterised and their functions unknown. This, in turn, has led to the need for a greater number and diversity of tools and databases providing annotation through transfer based on homology and prediction methods. Though many such tools to annotate protein sequence and structure exist, they are spread throughout the world, often with dedicated individual web pages. This situation does not provide a consensus view of the data and hinders comparison between methods. Integration of these methods is needed. So far this has not been possible since there was no common vocabulary available that could be used as a standard language. A variety of terms could be used to describe any particular feature ranging from different spellings to completely different terms. The Protein Feature Ontology (http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=BS) is a structured controlled vocabulary for features of a protein sequence or structure. It provides a common language for tools and methods to use, so that integration and comparison of their annotations is possible. The Protein Feature Ontology comprises approximately 100 positional terms (located in a particular region of the sequence), which have been integrated into the Sequence Ontology (SO). 40 non-positional terms which describe general protein properties have also been defined and, in addition, post-translational modifications are described by using an already existing ontology, the Protein Modification Ontology (MOD). The Protein Feature Ontology has been used by the BioSapiens Network of Excellence, a consortium comprising 19 partner sites in 14 European countries generating over 150 distinct annotation types for protein sequences and structures.
doi:10.1093/bioinformatics/btn528
PMCID: PMC2912506  PMID: 18936051
22.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) 
Nucleic Acids Research  2002;30(1):69-72.
The Saccharomyces Genome Database (SGD) resources, ranging from genetic and physical maps to genome-wide analysis tools, reflect the scientific progress in identifying genes and their functions over the last decade. As emphasis shifts from identification of the genes to identification of the role of their gene products in the cell, SGD seeks to provide its users with annotations that will allow relationships to be made between gene products, both within Saccharomyces cerevisiae and across species. To this end, SGD is annotating genes to the Gene Ontology (GO), a structured representation of biological knowledge that can be shared across species. The GO consists of three separate ontologies describing molecular function, biological process and cellular component. The goal is to use published information to associate each characterized S.cerevisiae gene product with one or more GO terms from each of the three ontologies. To be useful, this must be done in a manner that allows accurate associations based on experimental evidence, modifications to GO when necessary, and careful documentation of the annotations through evidence codes for given citations. Reaching this goal is an ongoing process at SGD. For information on the current progress of GO annotations at SGD and other participating databases, as well as a description of each of the three ontologies, please visit the GO Consortium page at http://www.geneontology.org. SGD gene associations to GO can be found by visiting our site at http://genome-www.stanford.edu/Saccharomyces/.
PMCID: PMC99086  PMID: 11752257
23.  Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL 
Journal of Biomedical Semantics  2012;3(Suppl 1):S3.
Motivation
Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open Biomedical Ontologies (OBO) community, have provided many other ontologies and an increasingly large volume of annotations of gene products that can be exploited in query and analysis. As many annotations with different ontologies centre upon gene products, there is a possibility to explore gene products through multiple ontological perspectives at the same time. Questions could be asked that link a gene product’s function, process, cellular location, phenotype and disease. Current tools, such as AmiGO, allow exploration of genes based on their GO annotations, but not through multiple ontological perspectives. In addition, the semantics of these ontology’s representations should be able to, through automated reasoning, afford richer query opportunities of the gene product annotations than is currently possible.
Results
To do this multi-perspective, richer querying of gene product annotations, we have created the Logical Gene Ontology, or GOAL ontology, in OWL that combines the Gene Ontology, Human Disease Ontology and the Mammalian Phenotype Ontology, together with classes that represent the annotations with these ontologies for mouse gene products. Each mouse gene product is represented as a class, with the appropriate relationships to the GO aspects, phenotype and disease with which it has been annotated. We then use defined classes to query these protein classes through automated reasoning, and to build a complex hierarchy of gene products. We have presented this through a Web interface that allows arbitrary queries to be constructed and the results displayed.
Conclusion
This standard use of OWL affords a rich interaction with Gene Ontology, Human Disease Ontology and Mammalian Phenotype Ontology annotations for the mouse, to give a fine partitioning of the gene products in the GOAL ontology. OWL in combination with automated reasoning can be effectively used to query across ontologies to ask biologically rich questions. We have demonstrated that automated reasoning can be used to deliver practical on-line querying support for the ontology annotations available for the mouse.
Availability
The GOAL Web page is to be found at http://owl.cs.manchester.ac.uk/goal.
doi:10.1186/2041-1480-3-S1-S3
PMCID: PMC3337258  PMID: 22541594
24.  GermOnline, a cross-species community knowledgebase on germ cell differentiation 
Nucleic Acids Research  2004;32(Database issue):D560-D567.
GermOnline provides information and microarray expression data for genes involved in mitosis and meiosis, gamete formation and germ line development across species. The database has been developed, and is being curated and updated, by life scientists in cooperation with bioinformaticists. Information is contributed through an online form using free text, images and the controlled vocabulary developed by the GeneOntology Consortium. Authors provide up to three references in support of their contribution. The database is governed by an international board of scientists to ensure a standardized data format and the highest quality of GermOnline’s information content. Release 2.0 provides exclusive access to microarray expression data from Saccharomyces cerevisiae and Rattus norvegicus, as well as curated information on ∼700 genes from various organisms. The locus report pages include links to external databases that contain relevant annotation, microarray expression and proteome data. Conversely, the Saccharomyces Genome Database (SGD), S.cerevisiae GeneDB and Swiss-Prot link to the budding yeast section of GermOnline from their respective locus pages. GermOnline, a fully operational prototype subject-oriented knowledgebase designed for community annotation and array data visualization, is accessible at http://www.germonline.org. The target audience includes researchers who work on mitotic cell division, meiosis, gametogenesis, germ line development, human reproductive health and comparative genomics.
doi:10.1093/nar/gkh055
PMCID: PMC308789  PMID: 14681481
25.  Obol: Integrating Language and Meaning in Bio-Ontologies 
Comparative and Functional Genomics  2004;5(6-7):509-520.
Ontologies are intended to capture and formalize a domain of knowledge. The ontologies comprising the Open Biological Ontologies (OBO) project, which includes the Gene Ontology (GO), are formalizations of various domains of biological knowledge. Ontologies within OBO typically lack computable definitions that serve to differentiate a term from other similar terms. The computer is unable to determine the meaning of a term, which presents problems for tools such as automated reasoners. Reasoners can be of enormous benefit in managing a complex ontology. OBO term names frequently implicitly encode the kind of definitions that can be used by computational tools, such as automated reasoners. The definitions encoded in the names are not easily amenable to computation, because the names are ostensibly natural language phrases designed for human users. These names are highly regular in their grammar, and can thus be treated as valid sentences in some formal or computable language.With a description of the rules underlying this formal language, term names can be parsed to derive computable definitions, which can then be reasoned over. This paper describes the effort to elucidate that language, called Obol, and the attempts to reason over the resulting definitions. The current implementation finds unique non-trivial definitions for around half of the terms in the GO, and has been used to find 223 missing relationships, which have since been added to the ontology. Obol has utility as an ontology maintenance tool, and as a means of generating computable definitions for a whole ontology.
The software is available under an open-source license from: http://www.fruitfly. org/~cjm/obol. Supplementary material for this article can be found at: http://www. interscience.wiley.com/jpages/1531-6912/suppmat.
doi:10.1002/cfg.435
PMCID: PMC2447432  PMID: 18629143

Results 1-25 (1088246)