PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (72)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
more »
1.  Quality Assurance in LOINC using Description Logic 
AMIA Annual Symposium Proceedings  2012;2012:1099-1108.
OBJECTIVE:
To assess whether errors can be found in LOINC by changing its representation to OWL DL and comparing its classification to that of SNOMED CT.
METHODS:
We created Description Logic definitions for LOINC concepts in OWL and merged the ontology with SNOMED CT to enrich the relatively flat hierarchy of LOINC parts. LOINC - SNOMED CT mappings were acquired through UMLS. The resulting ontology was classified with the ConDOR reasoner.
RESULTS:
Transformation into DL helped to identify 427 sets of logically equivalent LOINC codes, 676 sets of logically equivalent LOINC parts, and 239 inconsistencies in LOINC multiaxial hierarchy. Automatic classification of LOINC and SNOMED CT combined increased the connectivity within LOINC hierarchy and increased its coverage by an additional 9,006 LOINC codes.
CONCLUSIONS:
LOINC is a well-maintained terminology. While only a relatively small number of logical inconsistencies were found, we identified a number of areas where LOINC could benefit from the application of Description Logic.
PMCID: PMC3540427  PMID: 23304386
2.  Issues in Creating and Maintaining Value Sets for Clinical Quality Measures 
Objective:
To develop methods for assessing the validity, consistency and currency of value sets for clinical quality measures, in order to support the developers of quality measures in which such value sets are used.
Methods:
We assessed the well-formedness of the codes (in a given code system), the existence and currency of the codes in the corresponding code system, using the UMLS and RxNorm terminology services. We also investigated the overlap among value sets using the Jaccard similarity measure.
Results:
We extracted 163,788 codes (76,062 unique codes) from 1463 unique value sets in the 113 quality measures published by the National Quality Forum (NQF) in December 2011. Overall, 5% of the codes are invalid (4% of the unique codes). We also found 67 duplicate value sets and 10 pairs of value sets exhibiting a high degree of similarity (Jaccard > .9).
Conclusion:
Invalid codes affect a large proportion of the value sets (19%). 79% of the quality Measures have at least one value set exhibiting errors. However, 50% of the quality measures exhibit errors in less than 10 % of their value sets. The existence of duplicate and highly-similar value sets suggests the need for an authoritative repository of value sets and related tooling in order to support the development of quality measures.
PMCID: PMC3540585  PMID: 23304374
3.  Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature 
Bioinformatics  2010;27(3):408-415.
Motivation: A major goal of biomedical research in personalized medicine is to find relationships between mutations and their corresponding disease phenotypes. However, most of the disease-related mutational data are currently buried in the biomedical literature in textual form and lack the necessary structure to allow easy retrieval and visualization. We introduce a high-throughput computational method for the identification of relevant disease mutations in PubMed abstracts applied to prostate (PCa) and breast cancer (BCa) mutations.
Results: We developed the extractor of mutations (EMU) tool to identify mutations and their associated genes. We benchmarked EMU against MutationFinder—a tool to extract point mutations from text. Our results show that both methods achieve comparable performance on two manually curated datasets. We also benchmarked EMU's performance for extracting the complete mutational information and phenotype. Remarkably, we show that one of the steps in our approach, a filter based on sequence analysis, increases the precision for that task from 0.34 to 0.59 (PCa) and from 0.39 to 0.61 (BCa). We also show that this high-throughput approach can be extended to other diseases.
Discussion: Our method improves the current status of disease-mutation databases by significantly increasing the number of annotated mutations. We found 51 and 128 mutations manually verified to be related to PCa and Bca, respectively, that are not currently annotated for these cancer types in the OMIM or Swiss-Prot databases. EMU's retrieval performance represents a 2-fold improvement in the number of annotated mutations for PCa and BCa. We further show that our method can benefit from full-text analysis once there is an increase in Open Access availability of full-text articles.
Availability: Freely available at: http://bioinf.umbc.edu/EMU/ftp.
Contact: mkann@umbc.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq667
PMCID: PMC3031038  PMID: 21138947
4.  A unified framework for managing provenance information in translational research 
BMC Bioinformatics  2011;12:461.
Background
A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists.
Results
We identify a common set of challenges in managing provenance information across the pre-publication and post-publication phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata:
(a) Provenance collection - during data generation
(b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics
(c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications
(d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applications
We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for Trypanosoma cruzi (T.cruzi SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness.
Conclusions
The SPF provides a unified framework to effectively manage provenance of translational research data during pre and post-publication phases. This framework is underpinned by an upper-level provenance ontology called Provenir that is extended to create domain-specific provenance ontologies to facilitate provenance interoperability, seamless propagation of provenance, automated querying, and analysis.
doi:10.1186/1471-2105-12-461
PMCID: PMC3298549  PMID: 22126369
5.  Comparing and evaluating terminology services application programming interfaces: RxNav, UMLSKS and LexBIG 
To facilitate the integration of terminologies into applications, various terminology services application programming interfaces (API) have been developed in the recent past. In this study, three publicly available terminology services API, RxNav, UMLSKS and LexBIG, are compared and functionally evaluated with respect to the retrieval of information from one biomedical terminology, RxNorm, to which all three services provide access. A list of queries is established covering a wide spectrum of terminology services functionalities such as finding RxNorm concepts by their name, or navigating different types of relationships. Test data were generated from the RxNorm dataset to evaluate the implementation of the functionalities in the three API. The results revealed issues with various aspects of the API implementation (eg, handling of obsolete terms by LexBIG) and documentation (eg, navigational paths used in RxNav) that were subsequently addressed by the development teams of the three API investigated. Knowledge about such discrepancies helps inform the choice of an API for a given use case.
doi:10.1136/jamia.2009.001149
PMCID: PMC3000749  PMID: 20962136
6.  Extracting Rx information from clinical narrative 
Objective
The authors used the i2b2 Medication Extraction Challenge to evaluate their entity extraction methods, contribute to the generation of a publicly available collection of annotated clinical notes, and start developing methods for ontology-based reasoning using structured information generated from the unstructured clinical narrative.
Design
Extraction of salient features of medication orders from the text of de-identified hospital discharge summaries was addressed with a knowledge-based approach using simple rules and lookup lists. The entity recognition tool, MetaMap, was combined with dose, frequency, and duration modules specifically developed for the Challenge as well as a prototype module for reason identification.
Measurements
Evaluation metrics and corresponding results were provided by the Challenge organizers.
Results
The results indicate that robust rule-based tools achieve satisfactory results in extraction of simple elements of medication orders, but more sophisticated methods are needed for identification of reasons for the orders and durations.
Limitations
Owing to the time constraints and nature of the Challenge, some obvious follow-on analysis has not been completed yet.
Conclusions
The authors plan to integrate the new modules with MetaMap to enhance its accuracy. This integration effort will provide guidance in retargeting existing tools for better processing of clinical text.
doi:10.1136/jamia.2010.003970
PMCID: PMC2995679  PMID: 20819859
7.  The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside 
Journal of Biomedical Semantics  2011;2(Suppl 2):S1.
Background
Translational medicine requires the integration of knowledge using heterogeneous data from health care to the life sciences. Here, we describe a collaborative effort to produce a prototype Translational Medicine Knowledge Base (TMKB) capable of answering questions relating to clinical practice and pharmaceutical drug discovery.
Results
We developed the Translational Medicine Ontology (TMO) as a unifying ontology to integrate chemical, genomic and proteomic data with disease, treatment, and electronic health records. We demonstrate the use of Semantic Web technologies in the integration of patient and biomedical data, and reveal how such a knowledge base can aid physicians in providing tailored patient care and facilitate the recruitment of patients into active clinical trials. Thus, patients, physicians and researchers may explore the knowledge base to better understand therapeutic options, efficacy, and mechanisms of action.
Conclusions
This work takes an important step in using Semantic Web technologies to facilitate integration of relevant, distributed, external sources and progress towards a computational platform to support personalized medicine.
Availability
TMO can be downloaded from http://code.google.com/p/translationalmedicineontology and TMKB can be accessed at http://tm.semanticscience.org/sparql.
doi:10.1186/2041-1480-2-S2-S1
PMCID: PMC3102889  PMID: 21624155
8.  An Approximate Matching Method for Clinical Drug Names 
AMIA Annual Symposium Proceedings  2011;2011:1117-1126.
Objective:
To develop an approximate matching method for finding the closest drug names within existing RxNorm content for drug name variants found in local drug formularies.
Methods:
We used a drug-centric algorithm to determine the closest strings between the RxNorm data set and local variants which failed the exact and normalized string matching searches. Aggressive measures such as token splitting, drug name expansion and spelling correction are used to try and resolve drug names. The algorithm is evaluated against three sets containing a total of 17,164 drug name variants.
Results:
Mapping of the local variant drug names to the targeted concept descriptions ranged from 83.8% to 92.8% in three test sets. The algorithm identified the appropriate RxNorm concepts as the top candidate in 76.8%, 67.9% and 84.8% of the cases in the three test sets and among the top three candidates in 90–96% of the cases.
Conclusion:
Using a drug-centric token matching approach with aggressive measures to resolve unknown names provides effective mappings to clinical drug names and has the potential of facilitating the work of drug terminology experts in mapping local formularies to reference terminologies.
PMCID: PMC3243188  PMID: 22195172
10.  Auditing Associative Relations across Two Knowledge Sources 
Journal of biomedical informatics  2009;42(3):426-439.
Objectives
This paper proposes a novel semantic method for auditing associative relations in biomedical terminologies. We tested our methodology on two Unified Medical Language System (UMLS) knowledge sources.
Methods
We use the UMLS semantic groups as high-level representations of the domain and range of relationships in the Metathesaurus and in the Semantic Network. A mapping created between Metathesaurus relationships and Semantic Network relationships forms the basis for comparing the signatures of a given Metathesaurus relationship to the signatures of the semantic relationship to which it is mapped. The consistency of Metathesaurus relations is studied for each relationship.
Results
Of the 177 associative relationships in the Metathesaurus, 84 (48%) exhibit a high degree of consistency with the corresponding Semantic Network relationships. Overall, 63% of the 1.8M associative relations in the Metathesaurus are consistent with relations in the Semantic Network.
Conclusion
The semantics of associative relationships in biomedical terminologies should be defined explicitly by their developers. The Semantic Network would benefit from being extended with new relationships and with new relations for some existing relationships. The UMLS editing environment could take advantage of the correspondence established between relationships in the Metathesaurus and the Semantic Network. Finally, the auditing method also yielded useful information for refining the mapping of associative relationships between the two sources.
doi:10.1016/j.jbi.2009.01.004
PMCID: PMC2891883  PMID: 19475724
Biomedical terminologies; Associative relationships; Auditing methods; Unified Medical Language System (UMLS)
11.  GENESTRACE: PHENOMIC KNOWLEDGE DISCOVERY VIA STRUCTURED TERMINOLOGY 
The era of applied genomic medicine is quickly approaching accompanied by the increasing availability of detailed genetic information. Understanding the genetic etiology behind complex, multi-gene diseases remains an important challenge. In order to uncover the putative genetic etiology of complex diseases, we designed a method that explores the relationships between two major terminological and ontological resources: the Unified Medical Language System (UMLS) and the Gene Ontology (GO). The UMLS has a mainly clinical emphasis; Gene Ontology (GO) has become the standard for biological annotations of genes and gene products. Using statistical and semantic relationships within and between the two resources, we are able to infer relationships between disease concepts in the UMLS and gene products annotated using GO and its associated databases. We validated our inferences by comparing them to the known gene-disease relationships, as defined in the Online Mendelian Inheritance in Man1s morbidmap. The proof-of-concept methods presented here are unique in that they bypass the ambiguity of the direct extraction of gene or disease term from MEDLINE. Additionally, our methods provide direct links to clinically significant diseases through established terminologies or ontologies. The preliminary results presented here indicate the potential utility of exploiting the existing, manually curated relationships in biomedical resources as a tool for the discovery of potentially valuable new gene-disease relationships. The GenesTrace system may be accessed at the following URL: http://phene .cpmc.columbia.edu:8080/genesTrace/index.jsp
PMCID: PMC2894422  PMID: 15759618
12.  A Graph-based Approach to Auditing RxNorm 
Journal of biomedical informatics  2009;42(3):558-570.
Objectives
RxNorm is a standardized nomenclature for clinical drug entities developed by the National Library of Medicine. In this paper, we audit relations in RxNorm for consistency and completeness through the systematic analysis of the graph of its concepts and relationships.
Methods
The representation of multi-ingredient drugs is normalized in order to make it compatible with that of single-ingredient drugs. All meaningful paths between two nodes in the type graph are computed and instantiated. Alternate paths are automatically compared and manually inspected in case of inconsistency.
Results
The 115 meaningful paths identified in the type graph can be grouped into 28 groups with respect to start and end nodes. Of the 19 groups of alternate paths (i.e., with two or more paths) between the start and end nodes, 9 (47%) exhibit inconsistencies. Overall, 28 (24%) of the 115 paths are inconsistent with other alternate paths. A total of 348 inconsistencies were identified in the April 2008 version of RxNorm and reported to the RxNorm team, of which 215 (62%) had been corrected in the January 2009 version of RxNorm.
Conclusion
The inconsistencies identified involve missing nodes (93), missing links (17), extraneous links (237) and one case of mix-up between two ingredients. Our auditing method proved effective in identifying a limited number of errors that had defeated the quality assurance mechanisms currently in place in the RxNorm production system. Some recommendations for the development of RxNorm are provided.
doi:10.1016/j.jbi.2009.04.004
PMCID: PMC2722378  PMID: 19394440
Biomedical terminologies; Auditing methods; RxNorm; Quality assurance; Graphs
13.  The caBIG Terminology Review Process 
Journal of biomedical informatics  2008;42(3):571-580.
The National Cancer Institute (NCI) is developing an integrated biomedical informatics infrastructure, the cancer Biomedical Informatics Grid (caBIG®), to support collaboration within the cancer research community. A key part of the caBIG architecture is the establishment of terminology standards for representing data. In order to evaluate the suitability of existing controlled terminologies, the caBIG Vocabulary and Data Elements Workspace (VCDE WS) working group has developed a set of criteria that serve to assess a terminology's structure, content, documentation, and editorial process. This paper describes the evolution of these criteria and the results of their use in evaluating four standard terminologies: the Gene Ontology (GO), the NCI Thesaurus (NCIt), the Common Terminology for Adverse Events (known as CTCAE), and the laboratory portion of the Logical Objects, Identifiers, Names and Codes (LOINC). The resulting caBIG criteria are presented as a matrix that may be applicable to any terminology standardization effort.
doi:10.1016/j.jbi.2008.12.003
PMCID: PMC2729758  PMID: 19154797
Terminology; Ontology; Auditing; Evaluation
14.  Methods for Managing Variation in Clinical Drug Names 
Objectives:
To develop normalization methods for managing the variation in clinical drug names.
Methods:
Manual examination of drug names from RxNorm and local variants collected from formularies led to the identification of three types of drug-specific normalization rules: expansion of abbreviations (e.g., tab to tablet);reformatting of specific elements (e.g., space between number and unit); and removal of salt variants (e.g., succinate from metoprolol succinate).
Results:
After drug-specific normalization, recall of 3397 previously non-matching names from formularies reaches 45% overall (70% of some subsets), compared to 10–20% after generic normalization. Ambiguity has not increased significantly in the RxNorm dataset.
Conclusions:
A limited number of drug-specific normalization operations provide significant improvement over general language normalization.
PMCID: PMC3041346  PMID: 21347056
15.  Investigating Drug Classes in Biomedical Terminologies From the Perspective of Clinical Decision Support 
Objectives:
To assess whether 1) the necessary drug classes and 2) the necessary drug-class membership relations are represented in biomedical terminologies in order to support clinical decision regarding drug-drug interactions.
Methods:
In order to investigate drug classes and drug-class membership in clinical terminologies, we start by establishing a reference list of these entities. Then, we map drugs and classes to the UMLS, where we investigate their relations.
Results:
186 (83%) of the 223 names for drug classes mapped to the UMLS. The single best source is SNOMED CT with 75%. 140 (89%) of the 157 drug-membership relations were found in the UMLS.
Conclusions:
One important category of drug classes missing from all clinical terminologies is related to drug metabolism by the Cytochrome P450 enzyme family.
PMCID: PMC3041355  PMID: 21346940
16.  Large-scale, Exhaustive Lattice-based Structural Auditing of SNOMED CT 
One criterion for the well-formedness of ontologies is that their hierarchical structure forms a lattice. Formal Concept Analysis (FCA) has been used as a technique for assessing the quality of ontologies, but is not scalable to large ontologies such as SNOMED CT (> 300k concepts). We developed a methodology called Lattice-based Structural Auditing (LaSA), for auditing biomedical ontologies, implemented through automated SPARQL queries, in order to exhaustively identify all non-lattice pairs in SNOMED CT. The percentage of non-lattice pairs ranges from 0 to 1.66 among the 19 SNOMED CT hierarchies. Preliminary manual inspection of a limited portion of the over 544k non-lattice pairs, among over 356 million candidate pairs, revealed inconsistent use of precoordination in SNOMED CT, but also a number of false positives. Our results are consistent with those based on FCA, with the advantage that the LaSA pipeline is scalable and applicable to ontological systems consisting mostly of taxonomic links.
PMCID: PMC3041382  PMID: 21347113
17.  Looking for Anemia (and Other Disorders) in SNOMED CT: Comparison of Three Approaches and Practical Implications 
Health professionals are faced with challenges when they have to exploit the semantics of concepts present in clinical terminologies in support of research activities. The difficulty lies in the fact that this semantics is represented not only through the labels of concepts, but also their position in the hierarchy, and, when available, their logical and textual definitions. We investigate and contrast the lexical, hierarchical, and logical representations of concepts in SNOMED CT through the example of Anemia and three other disorders. The four use cases we developed suggest that the lexical, hierarchical, and logical representations of concepts have a limited degree of overlap, but are complementary. Finally, we draw practical implications from our findings for SNOMED CT users and developers.
PMCID: PMC3041405  PMID: 21347034
18.  An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence 
Journal of biomedical informatics  2008;41(5):752-765.
Objectives
This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base.
Methods
We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries.
Results
Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins.
Conclusion
Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces.
Resource page
http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/
doi:10.1016/j.jbi.2008.02.006
PMCID: PMC2766186  PMID: 18395495
Semantic Web; Semantic mashup; Nicotine dependence; Information integration; Ontologies
19.  Identification of cis-Regulatory Elements in Gene Co-expression Networks Using A-GLAM 
Reliable identification and assignment of cis-regulatory elements in promoter regions is a challenging problem in biology. The sophistication of transcriptional regulation in higher eukaryotes, particularly in metazoans, could be an important factor contributing to their organismal complexity. Here we present an integrated approach where networks of co-expressed genes are combined with gene ontology–derived functional networks to discover clusters of genes that share both similar expression patterns and functions. Regulatory elements are identified in the promoter regions of these gene clusters using a Gibbs sampling algorithm implemented in the A-GLAM software package. Using this approach, we analyze the cell-cycle co-expression network of the yeast Saccharomyces cerevisiae, showing that this approach correctly identifies cis-regulatory elements present in clusters of co-expressed genes.
doi:10.1007/978-1-59745-243-4_1
PMCID: PMC2719765  PMID: 19381547
Promoter sequences; transcription factor–binding sites; co-expression; networks; gene ontology; Gibbs sampling
20.  Alignment of the UMLS semantic network with BioTop: methodology and assessment 
Bioinformatics  2009;25(12):i69-i76.
Motivation: For many years, the Unified Medical Language System (UMLS) semantic network (SN) has been used as an upper-level semantic framework for the categorization of terms from terminological resources in biomedicine. BioTop has recently been developed as an upper-level ontology for the biomedical domain. In contrast to the SN, it is founded upon strict ontological principles, using OWL DL as a formal representation language, which has become standard in the semantic Web. In order to make logic-based reasoning available for the resources annotated or categorized with the SN, a mapping ontology was developed aligning the SN with BioTop.
Methods: The theoretical foundations and the practical realization of the alignment are being described, with a focus on the design decisions taken, the problems encountered and the adaptations of BioTop that became necessary. For evaluation purposes, UMLS concept pairs obtained from MEDLINE abstracts by a named entity recognition system were tested for possible semantic relationships. Furthermore, all semantic-type combinations that occur in the UMLS Metathesaurus were checked for satisfiability.
Results: The effort-intensive alignment process required major design changes and enhancements of BioTop and brought up several design errors that could be fixed. A comparison between a human curator and the ontology yielded only a low agreement. Ontology reasoning was also used to successfully identify 133 inconsistent semantic-type combinations.
Availability: BioTop, the OWL DL representation of the UMLS SN, and the mapping ontology are available at http://www.purl.org/biotop/.
Contact: stschulz@uni-freiburg.de
doi:10.1093/bioinformatics/btp194
PMCID: PMC2687948  PMID: 19478019
21.  Co-evolutionary Rates of Functionally Related Yeast Genes 
Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous (dN) and synonymous (dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (ΔdN & ΔdS) were then compared to their functional similarities (sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between ΔdN and sGO, whereas there is no apparent relationship between ΔdS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.
PMCID: PMC2674680  PMID: 18345352
Functional inference; Co-evolution; natural selection; genome evolution; gene ontology
22.  Using SNOMED CT in combination with MedDRA for reporting signal detection and adverse drug reactions reporting 
Objective:
To investigate the feasibility of using SNOMED CT as an entry point for coding adverse drug reactions and map them automatically to MedDRA for reporting purposes and interoperability with legacy repositories.
Methods:
On the one hand, we attempt to map SNOMED CT concepts to MedDRA concepts through the UMLS, using synonymy and explicit mapping relations. On the other, we compute the set of all fine-grained concepts that can be reached from concepts having a mapping to MedDRA.
Results:
58% of the Preferred Terms in MedDRA have a mapping to SNOMED CT. Through the descendants in SNOMED CT, 108,305 additional SNOMED CT concepts can be linked to MedDRA.
Conclusions:
Fine-grained SNOMED CT concepts can be mapped automatically to MedDRA. This approach has the potential to enable the collection of adverse events related to drugs directly from clinical repositories. The quality of the mapping needs to be evaluated.
PMCID: PMC2815504  PMID: 20351820
23.  Two approaches to integrating phenotype and clinical information 
Linkages between animal models of diseases and human data enable the development of translational research hypotheses. The objective of this study is to investigate two approaches to integrating phenotype and clinical information. On the one hand, we develop a terminology mapping between phenotypes from the Mammalian Phenotype Ontology (MPO) and Online Mendelian Inheritance in Man (OMIM) through the Unified Medical Language System (UMLS). On the other, we associate MPO phenotypes with OMIM manifestations through annotations made to orthologous genes. 1,469 MPO concepts (22%) were mapped successfully to some disease concept in the UMLS, of which 869 were present in OMIM. Among the 16,764 distinct MGI genes associated with human orthologs, 1,968 distinct genes were associated with both MPO and OMIM annotations. The UMLS is a valuable resource for linking phenotype terms to clinical terminologies, and these mappings between terminologies can help enrich gene annotation databases and unify phenotype representation.
PMCID: PMC2815427  PMID: 20351826
24.  Experience in Aligning Anatomical Ontologies 
An ontology is a formal representation of a domain modeling the entities in the domain and their relations. When a domain is represented by multiple ontologies, there is need for creating mappings among these ontologies in order to facilitate the integration of data annotated with these ontologies and reasoning across ontologies. The objective of this paper is to recapitulate our experience in aligning large anatomical ontologies and to reflect on some of the issues and challenges encountered along the way. The four anatomical ontologies under investigation are the Foundational Model of Anatomy, GALEN, the Adult Mouse Anatomical Dictionary and the NCI Thesaurus. Their underlying representation formalisms are all different. Our approach to aligning concepts (directly) is automatic, rule-based, and operates at the schema level, generating mostly point-to-point mappings. It uses a combination of domain-specific lexical techniques and structural and semantic techniques (to validate the mappings suggested lexically). It also takes advantage of domain-specific knowledge (lexical knowledge from external resources such as the Unified Medical Language System, as well as knowledge augmentation and inference techniques). In addition to point-to-point mapping of concepts, we present the alignment of relationships and the mapping of concepts group-to-group. We have also successfully tested an indirect alignment through a domain-specific reference ontology. We present an evaluation of our techniques, both against a gold standard established manually and against a generic schema matching system. The advantages and limitations of our approach are analyzed and discussed throughout the paper.
PMCID: PMC2575410  PMID: 18974854
Ontology; ontology alignment; knowledge representation; anatomy; Semantic Web
25.  From “Glycosyltransferase” to “Congenital Muscular Dystrophy”: Integrating Knowledge from NCBI Entrez Gene and the Gene Ontology 
Entrez Gene (EG), Online Mendelian Inheritance in Man (OMIM) and the Gene Ontology (GO) are three complementary knowledge resources that can be used to correlate genomic data with disease information. However, bridging between genotype and phenotype through these resources currently requires manual effort or the development of customized software. In this paper, we argue that integrating EG and GO provides a robust and flexible solution to this problem. We demonstrate how the Resource Description Framework (RDF) developed for the Semantic Web can be used to represent and integrate these resources and enable seamless access to them as a unified resource. We illustrate the effectiveness of our approach by answering a real-world biomedical query linking a specific molecular function, glycosyltransferase, to the disorder congenital muscular dystrophy.
PMCID: PMC2562001  PMID: 17911917
knowledge integration; Semantic Web; RDF; Entrez Gene; Gene Ontology

Results 1-25 (72)