We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions.
Ontologies; data integration; semantic web; query web
The complexity and rapid growth of genetic data demand investment in information technology to support effective use of this information. Creating infrastructure to communicate genetic information to health care providers and enable them to manage that data can positively affect a patient’s care in many ways. However, genetic data are complex and present many challenges. We report on the usability of a novel application designed to assist providers in receiving and managing a patient’s genetic profile, including ongoing updated interpretations of the genetic variants in those patients. Because these interpretations are constantly evolving, managing them represents a challenge. We conducted usability tests with potential users of this application and reported findings to the application development team, many of which were addressed in subsequent versions. Clinicians were excited about the value this tool provides in pushing out variant updates to providers and overall gave the application high usability ratings, but had some difficulty interpreting elements of the interface. Many issues identified required relatively little development effort to fix suggesting that consistently incorporating this type of analysis in the development process can be highly beneficial. For genetic decision support applications, our findings suggest the importance of designing a system that can deliver the most current knowledge and highlight the significance of new genetic information for clinical care. Our results demonstrate that using a development and design process that is user focused helped optimize the value of this application for personalized medicine.
clinical decision support; electronic health records; genomics; personalized medicine
In this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to enable discrimination between documents of different classes. We developed novel information-theoretic techniques that utilize the taxonomical structure of the Unified Medical Language System (UMLS) to improve feature ranking, and we developed a semantic similarity measure that projects clinical text into a feature space that improves classification. We evaluated these methods on the 2008 Integrating Informatics with Biology and the Bedside (I2B2) obesity challenge. The methods we developed improve upon the results of this challenge’s top machine-learning based system, and may improve the performance of other machine-learning based clinical text classification systems. We have released all tools developed as part of this study as open source, available at http://code.google.com/p/ytex
Natural Language Processing; Document Classification; Semantic Similarity; Feature Selection; Kernel Methods; Information Gain; Information Content
The main objective of this study was to investigate the feasibility of using PharmGKB, a pharmacogenomic database, as a source of training data in combination with text of MEDLINE abstracts for a text mining approach to identification of potential gene targets for pathway-driven pharmacogenomics research. We used the manually curated relations between drugs and genes in PharmGKB database to train a support vector machine predictive model and applied this model prospectively to MEDLINE abstracts. The gene targets suggested by this approach were subsequently manually reviewed. Our quantitative analysis showed that a support vector machine classifiers trained on MEDLINE abstracts with single words (unigrams) used as features and PharmGKB relations used for supervision, achieve an overall sensitivity of 85% and specificity of 69%. The subsequent qualitative analysis showed that gene targets “suggested” by the automatic classifier were not anticipated by expert reviewers but were subsequently found to be relevant to the three drugs that were investigated: carbamazepine, lamivudine and zidovudine. Our results show that this approach is not only feasible but may also find new gene targets not identifiable by other methods thus making it a valuable tool for pathway-driven pharmacogenomics research.
pharmacogenomics; text mining; support vector machine; pathway-driven analysis; gene-drug associations; PharmGKB
Recent progress in high-throughput genomic technologies has shifted pharmacogenomic research from candidate gene pharmacogenetics to clinical pharmacogenomics (PGx). Many clinical related questions may be asked such as ‘what drug should be prescribed for a patient with mutant alleles?’ Typically, answers to such questions can be found in publications mentioning the relationships of the gene–drug–disease of interest. In this work, we hypothesize that ClinicalTrials.gov is a comparable source rich in PGx related information. In this regard, we developed a systematic approach to automatically identify PGx relationships between genes, drugs and diseases from trial records in ClinicalTrials.gov. In our evaluation, we found that our extracted relationships overlap significantly with the curated factual knowledge through the literature in a PGx database and that most relationships appear on average 5 years earlier in clinical trials than in their corresponding publications, suggesting that clinical trials may be valuable for both validating known and capturing new PGx related information in a more timely manner. Furthermore, two human reviewers judged a portion of computer-generated relationships and found an overall accuracy of 74% for our text-mining approach. This work has practical implications in enriching our existing knowledge on PGx gene–drug–disease relationships as well as suggesting crosslinks between ClinicalTrials.gov and other PGx knowledge bases.
Text mining; Clinical outcome; Pharmacogenomics; Clinical trial
Despite the existence of multiple standards for the coding of biomedical data and the known benefits of doing so, there remain a myriad of biomedical information domain spaces that are essentially un-coded and unstandardized. Perhaps a worse situation is when the same or similar information in a given domain is coded to a variety of different standards. Such is the case with cephalometrics – standardized measurements of angles and distances between specified landmarks on X-ray film used for orthodontic treatment planning and a variety of research applications. We describe how we unified the existing cephalometric definitions from ten existing cephalometric standards to one unifying terminology set using an existing standard (LOINC). Using our example of an open and web-based orthodontic case file system, we describe how this work benefited our project and discuss how adopting or expanding established standards can benefit other similar projects in specialized domains.
Cephalometry; Logical Observation Identifiers Names and Codes; Terminology as Topic; Systems Integration; Information Storage and Retrieval
To support clinical decision-making,computerized information retrieval tools known as “infobuttons” deliver contextually-relevant knowledge resources intoclinical information systems.The Health Level Seven International(HL7)Context-Aware Knowledge Retrieval (Infobutton) Standard specifies a standard mechanism to enable infobuttons on a large scale.
To examine the experience of organizations in the course of implementing the HL7 Infobutton Standard.
Cross-sectionalonline survey and in-depth phone interviews.
A total of 17 organizations participated in the study.Analysis of the in-depth interviews revealed 20 recurrent themes.Implementers underscored the benefits, simplicity, and flexibility of the HL7 Infobutton Standard. Yet, participants voiced the need for easier access to standard specifications and improved guidance to beginners. Implementers predicted that the Infobutton Standard will be widely or at least fairly well adopted in the next five years, but uptake will dependlargely on adoption among electronic health record (EHR) vendors. To accelerate EHR adoption of the Infobutton Standard,implementers recommended HL7-compliant infobutton capabilities to be included in the United States Meaningful Use Certification Criteria EHR systems.
Opinions and predictions should be interpreted with caution, since all the participant organizations have successfully implemented the Standard and overhalf of the organizations were actively engaged in the development of the Standard.
Overall, implementers reported a very positive experience with the HL7 Infobutton Standard.Despite indications of increasing uptake, measures should be taken to stimulate adoption of the Infobutton Standard among EHR vendors. Widespread adoption of the Infobutton standard has the potential to bring contextually relevant clinical decision support content into the healthcare provider workflow.
information need; health information technology; standard; clinical decision support; electronic health record system; knowledge resource
Mapping medical test names into a standardized vocabulary is a prerequisite to sharing test-related data between healthcare entities. One major barrier in this process is the inability to describe tests in sufficient detail to assign the appropriate name in Logical Observation Identifiers, Names, and Codes (LOINC®). Approaches to address mapping of test names with incomplete information have not been well described. We developed a process of "enhancing" local test names by incorporating information required for LOINC mapping into the test names themselves. When using the Regenstrief LOINC Mapping Assistant (RELMA) we found that 73/198 (37%) of "enhanced" test names were successfully mapped to LOINC, compared to 41/191 (21%) of original names (p=0.001). Our approach led to a significantly higher proportion of test names with successful mapping to LOINC, but further efforts are required to achieve more satisfactory results.
Standardized terminology mapping; LOINC
Interoperable health information exchange depends on adoption of terminology standards, but international use of such standards can be challenging because of language differences between local concept names and the standard terminology. To address this important barrier, we describe the evolution of an efficient process for constructing translations of LOINC terms names, the foreign language functions in RELMA, and the current state of translations in LOINC. We also present the development of the Italian translation to illustrate how translation is enabling adoption in international contexts. We built a tool that finds the unique list of LOINC Parts that make up a given set of LOINC terms. This list enables translation of smaller pieces like the core component “hepatitis c virus” separately from all the suffixes that could appear with it, such “Ab.IgG”, “DNA”, and “RNA”. We built another tool that generates a translation of a full LOINC name from all of these atomic pieces. As of version 2.36 (June 2011), LOINC terms have been translated into 9 languages from 15 linguistic variants other than its native English. The five largest linguistic variants have all used the Part-based translation mechanism. However, even with efficient tools and processes, translation of standard terminology is a complex undertaking. Two of the prominent linguistic challenges that translators have faced include: the approach to handling acronyms and abbreviations, and the differences in linguistic syntax (e.g. word order) between languages. LOINC’s open and customizable approach has enabled many different groups to create translations that met their needs and matched their resources. Distributing the standard and its many language translations at no cost worldwide accelerates LOINC adoption globally, and is an important enabler of interoperable health information exchange
LOINC; Vocabulary, Controlled; Multilingualism; Translating; Clinical Laboratory Information Systems/standards; Medical Records Systems; Computerized/standards
Clinical databases provide a rich source of data for answering clinical research questions. However, the variables recorded in clinical data systems are often identified by local, idiosyncratic, and sometimes redundant and/or ambiguous names (or codes) rather than unique, well-organized codes from standard code systems. This reality discourages research use of such databases, because researchers must invest considerable time in cleaning up the data before they can ask their first research question. Researchers at MIT developed MIMIC-II, a nearly complete collection of clinical data about intensive care patients. Because its data are drawn from existing clinical systems, it has many of the problems described above. In collaboration with the MIT researchers, we have begun a process of cleaning up the data and mapping the variable names and codes to LOINC codes. Our first step, which we describe here, was to map all of the laboratory test observations to LOINC codes. We were able to map 87% of the unique laboratory tests that cover 94% of the total number of laboratory tests results. Of the 13% of tests that we could not map, nearly 60% were due to test names whose real meaning could not be discerned and 29% represented tests that were not yet included in the LOINC table. These results suggest that LOINC codes cover most of laboratory tests used in critical care. We have delivered this work to the MIMIC-II researchers, who have included it in their standard MIMIC-II database release so that researchers who use this database in the future will not have to do this work.
Vocabulary standards; laboratory tests; LOINC; MIMIC-II; secondary use; mapping guidelines
We describe a clinical research visit scheduling system that can potentially coordinate clinical research visits with patient care visits and increase efficiency at clinical sites where clinical and research activities occur simultaneously. Participatory Design methods were applied to support requirements engineering and to create this software called Integrated Model for Patient Care and Clinical Trials (IMPACT). Using a multi-user constraint satisfaction and resource optimization algorithm, IMPACT automatically synthesizes temporal availability of various research resources and recommends the optimal dates and times for pending research visits. We conducted scenario-based evaluations with 10 clinical research coordinators (CRCs) from diverse clinical research settings to assess the usefulness, feasibility, and user acceptance of IMPACT. We obtained qualitative feedback using semi-structured interviews with the CRCs. Most CRCs acknowledged the usefulness of IMPACT features. Support for collaboration within research teams and interoperability with electronic health records and clinical trial management systems were highly requested features. Overall, IMPACT received satisfactory user acceptance and proves to be potentially useful for a variety of clinical research settings. Our future work includes comparing the effectiveness of IMPACT with that of existing scheduling solutions on the market and conducting field tests to formally assess user adoption.
workflow; software; personnel staffing and scheduling; health resources; clinical research informatics; learning health systems
We present a novel framework for integrative biomarker discovery from related but separate data sets created in biomarker profiling studies. The framework takes prior knowledge in the form of interpretable, modular rules, and uses them during the learning of rules on a new data set. The framework consists of two methods of transfer of knowledge from source to target data: transfer of whole rules and transfer of rule structures. We evaluated the methods on three pairs of data sets: one genomic and two proteomic. We used standard measures of classification performance and three novel measures of amount of transfer. Preliminary evaluation shows that whole-rule transfer improves classification performance over using the target data alone, especially when there is more source data than target data. It also improves performance over using the union of the data sets.
biomarker discovery; molecular profiling; machine learning; rule learning; transfer learning
The era of “Personalized Medicine,” guided by individual molecular variation in DNA, RNA, expressed proteins and other forms of high volume molecular data brings new requirements and challenges to the design and implementation of Electronic Health Records (EHRs). In this article we describe the characteristics of biomolecular data that differentiate it from other classes of data commonly found in EHRs, enumerate a set of technical desiderata for its management in healthcare settings, and offer a candidate technical approach to its compact and efficient representation in operational systems.
Electronic Health Records; Genomics; Knowledge representation; Data compression
An open research question when leveraging ontological knowledge is when to treat different concepts separately from each other and when to aggregate them. For instance, concepts for the terms "paroxysmal cough" and "nocturnal cough" might be aggregated in a kidney disease study, but should be left separate in a pneumonia study. Determining whether two concepts are similar enough to be aggregated can help build better datasets for data mining purposes and avoid signal dilution. Quantifying the similarity among concepts is a difficult task, however, in part because such similarity is context-dependent. We propose a comprehensive method, which computes a similarity score for a concept pair by combining data-driven and ontology-driven knowledge. We demonstrate our method on concepts from SNOMED-CT and on a corpus of clinical notes of patients with chronic kidney disease. By combining information from usage patterns in clinical notes and from ontological structure, the method can prune out concepts that are simply related from those which are semantically similar. When evaluated against a list of concept pairs annotated for similarity, our method reaches an AUC (area under the curve) of 92%.
Semantic similarity; SNOMED-CT; distributional semantics; graph-based metrics; ontologies
A method for automated location of shape differences in diseased anatomical structures via high resolution biomedical atlases annotated with labels from formal ontologies is described. In particular, a high resolution magnetic resonance image of the myocardium of the human left ventricle was segmented and annotated with structural terms from an extracted subset of the Foundational Model of Anatomy ontology. The atlas was registered to the end systole template of a previous study of left ventricular remodeling in cardiomyopathy using a diffeomorphic registration algorithm. The previous study used thresholding and visual inspection to locate a region of statistical significance which distinguished patients with ischemic cardiomyopathy from those with nonischemic cardiomyopathy. Using semantic technologies and the deformed annotated atlas, this location was more precisely found. Although this study used only a cardiac atlas, it provides a proof-of-concept that ontologically labeled biomedical atlases of any anatomical structure can be used to automate location-based inferences.
Ontologies; Medical Atlases; Cardiac Left Ventricle; Computational Anatomy
To systematically review current health literacy (HL) instruments for use in consumer-facing and mobile health information technology screening and evaluation tools.
The databases, PubMed, OVID, Google Scholar, Cochrane Library and Science Citation Index, were searched for health literacy assessment instruments using the terms “health”, “literacy”, “computer-based,” and “psychometrics”. All instruments identified by this method were critically appraised according to their reported psychometric properties and clinical feasibility.
Eleven different health literacy instruments were found. Screening questions, such as asking a patient about his/her need for assistance in navigating health information, were evaluated in 7 different studies and are promising for use as a valid, reliable, and feasible computer-based approach to identify patients that struggle with low health literacy. However, there was a lack of consistency in the types of screening questions proposed. There is also a lack of information regarding the psychometric properties of computer-based health literacy instruments.
Only English language health literacy assessment instruments were reviewed and analyzed.
Current health literacy screening tools demonstrate varying benefits depending on the context of their use. In many cases, it seems that a single screening question may be a reliable, valid, and feasible means for establishing health literacy. A combination of screening questions that assess health literacy and technological literacy may enable tailoring eHealth applications to user needs. Further research should determine the best screening question(s) and the best synthesis of various instruments’ content and methodologies for computer-based health literacy screening and assessment.
Health Literacy; systematic review
Handoff is an intra-disciplinary process, yet the flow of critical handoff information spans multiple disciplines. Understanding this information flow is important for the development of computer-based tools that supports the communication and coordination of patient care in a multi-disciplinary and highly specialized critical care setting. We aimed to understand the structure, functionality, and content of nurses’ and physicians’ handoff artifacts.
We analyzed 22 nurses’ and physicians’ handoff artifacts from a Cardiothoracic Intensive Care Unit (CTICU) at a large urban medical center. We combined artifact analysis with semantic coding based on our published Interdisciplinary Handoff Information Coding (IHIC) framework for a novel two-step data analysis approach.
We found a high degree of structure and overlap in the content of nursing and physician artifacts. Our findings demonstrated a non-technical, yet sophisticated, system with a high degree of structure for the organization and communication of patient data that functions to coordinate the work of multiple disciplines in a highly specialized unit of patient care.
This study took place in one CTICU. Further work is needed to determine the generalizability of the results.
Our findings indicate that the development of semi-structured patient-centered interdisciplinary handoff tools with discipline specific views customized for specialty settings may effectively support handoff communication and patient safety.
Handoff; Interdisciplinary; Patient-centered
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications.
UMLS; Knowledge Discovery; Graph Database; disease gene prioritization; fold enrichment
Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC – 0.7715) than the passive learning method (random sampling) (ALC – 0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort.
Active learning; Natural language processing; Clinical text classification; Machine learning
Although trauma is the leading cause of death for those below 45 years of age, there is a dearth of information about the temporal behavior of the underlying biological mechanisms in those who survive the initial trauma only to later suffer from syndromes such as multiple organ failure. Levels of serum cytokines potentially affect the clinical outcomes of trauma; understanding how cytokine levels modulate intra-cellular signaling pathways can yield insights into molecular mechanisms of disease progression and help to identify targeted therapies. However, developing such analyses is challenging since it necessitates the integration and interpretation of large amounts of heterogeneous, quantitative and qualitative data. Here we present the Pathway Semantics Algorithm (PSA), an algebraic process of node and edge analyses of evoked biological pathways over time for in silico discovery of biomedical hypotheses, using data from a prospective controlled clinical study of the role of cytokines in multiple organ failure (MOF) at a major US trauma center. A matrix algebra approach was used in both the PSA node and PSA edge analyses with different matrix configurations and computations based on the biomedical questions to be examined. In the edge analysis, a percentage measure of crosstalk called XTALK was also developed to assess cross-pathway interference.
In the node/molecular analysis of the first 24 hours from trauma, PSA uncovered 7 molecules evoked computationally that differentiated outcomes of MOF or non-MOF (NMOF), of which 3 molecules had not been previously associated with any shock / trauma syndrome. In the edge/molecular interaction analysis, PSA examined four categories of functional molecular interaction relationships – activation, expression, inhibition, and transcription – and found that the interaction patterns and crosstalk changed over time and outcome. The PSA edge analysis suggests that a diagnosis, prognosis or therapy based on molecular interaction mechanisms may be most effective within a certain time period and for a specific functional relationship.
Systems biology; signaling pathways; trauma; hypothesis generation; biomedical informatics
Information extraction applications that extract structured event and entity information from unstructured text can leverage knowledge of clinical report structure to improve performance. The SOAP (Subjective, Objective, Assessment, Plan) framework, used to structure progress notes to facilitate problem-specific, clinical decision making by physicians, is one example of a well-known, canonical structure in the medical domain. Although its applicability to structuring data is understood, its contribution to information extraction tasks has not yet been determined. The first step to evaluating the SOAP framework’s usefulness for clinical information extraction is to apply the model to clinical narratives and develop an automated SOAP classifier that classifies sentences from clinical reports. In this quantitative study, we applied the SOAP framework to sentences from emergency department reports, and trained and evaluated SOAP classifiers built with various linguistic features. We found the SOAP framework can be applied manually to emergency department reports with high agreement (Cohen’s kappa coefficients over 0.70). Using a variety of features, we found classifiers for each SOAP class can be created with moderate to outstanding performance with F1 scores of 93.9 (subjective), 94.5 (objective), 75.7 (assessment), and 77.0 (plan). We look forward to expanding the framework and applying the SOAP classification to clinical information extraction tasks.
Support Vector Machine; SOAP notes; Problem Oriented Medical Record; POMR; prediction
Extracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text.
The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type “clinical trials” to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task.
The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged f-measure for exact match increased from 80.3% to 82.3% and the micro-averaged f-measure based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data.
NLP; Information extraction; NER; Distributional Semantics; Clinical Informatics
The current volume and complexity of genetic tests, and the molecular genetics knowledge and health knowledge related to interpretation of the results of those tests, are rapidly outstripping the ability of individual clinicians to recall, understand and convey to their patients information relevant to their care. The tailoring of molecular genetics knowledge and health knowledge in clinical settings is important both for the provision of personalized medicine and to reduce clinician information overload. In this paper we describe the incorporation, customization and demonstration of molecular genetic data (mainly sequence variants), molecular genetics knowledge and health knowledge into a standards-based electronic health record (EHR) prototype developed specifically for this study.
We extended the CCR (Continuity of Care Record), an existing EHR standard for representing clinical data, to include molecular genetic data. An EHR prototype was built based on the extended CCR and designed to display relevant molecular genetics knowledge and health knowledge from an existing knowledge base for cystic fibrosis (OntoKBCF). We reconstructed test records from published case reports and represented them in the CCR schema. We then used the EHR to dynamically filter molecular genetics knowledge and health knowledge from OntoKBCF using molecular genetic data and clinical data from the test cases.
The molecular genetic data were successfully incorporated in the CCR by creating a category of laboratory results called “Molecular Genetics ” and specifying a particular class of test (“Gene Mutation Test”) in this category. Unlike other laboratory tests reported in the CCR, results of tests in this class required additional attributes (“Molecular Structure” and “Molecular Position”) to support interpretation by clinicians. These results, along with clinical data (age, sex, ethnicity, diagnostic procedures, and therapies) were used by the EHR to filter and present molecular genetics knowledge and health knowledge from OntoKBCF.
This research shows a feasible model for delivering patient sequence variants and presenting tailored molecular genetics knowledge and health knowledge via a standards-based EHR system prototype. EHR standards can be extended to include the necessary patient data (as we have demonstrated in the case of the CCR), while knowledge can be obtained from external knowledge bases that are created and maintained independently from the EHR. This approach can form the basis for a personalized medicine framework, a more comprehensive standards-based EHR system and a potential platform for advancing translational research by both disseminating results and providing opportunities for new insights into phenotype-genotype relationships.
Molecular Genetic Information; Sequence variants; Electronic Health Record; Personalized information; Standards; Information filters