Search tips
Search criteria 


Logo of procamiaLink to Publisher's site
AMIA Annu Symp Proc. 2010; 2010: 477–481.
Published online 2010 November 13.
PMCID: PMC3041287

Managing Medical Vocabulary Updates in a Clinical Data Warehouse: An RxNorm Case Study


Use of terminology standards facilitates aggregating data from multiple sources for information retrieval, exchange and analysis. However, medical vocabularies are continuously updated and incorporating those changes consistently into clinical data warehouses requires rigorous methodology. To integrate pharmacy data from two hospital pharmacy information systems the Stanford Translational Research Integrated Database Environment (STRIDE) project mapped medication orders to RxNorm content using the RxNorm drug model. In order to keep the data relevant and up-to-date, we developed a strategy for updating to RxNorm, while preserving the original meaning and mapping of the legacy data. This case study discusses managing the vocabulary update by following the RxNorm content maintenance strategy and supplementing it with operations to retain access to its drug model information.


Medical knowledge evolves and, as a result, its codification in medical terminologies is a continuous process. Yet, the integrity of historical patient data needs to be maintained by retaining the original meaning of any coded data. To facilitate the process of medical vocabulary updates, strategies for terminology maintenance in medical vocabularies have also been defined. Cimino1 proposed an effective taxonomy of semantic changes in source vocabularies consisting of simple addition, refinement, pre-coordination, disambiguation, redundancy, obsolescence, discovered redundancy, minor name change, major name change, code reuse, and code change, as well as methods for handling each type of change. This change model was demonstrated in the context of updating ICD-9-CM within the Columbia University Medical Entities Dictionary (MED). Baorto et al.2 shared the practical methods used for auditing and long-term maintenance of both local and standard terminologies within the MED. Oliver et al.3 defined a change model for a concept-based vocabulary and the most essential operations which need to be included. Bakhshi-Raiez et al.4 developed a framework delineating the critical features of the maintenance process for medical terminological systems, which is also informative for implementing terminology updates within clinical research applications. These terminology maintenance frameworks have been useful in helping us design a maintenance strategy for controlled vocabularies used in the STRIDE Clinical Data Warehouse, a component of the Stanford Translational Research Integrated Database Environment (STRIDE)5. However, these terminology maintenance frameworks cover concept maintenance and presume substantial local knowledge representation maintenance. With the goal of minimizing local knowledge creation, improving data retrieval6 and preserving the original meaning of data longitudinally, pharmacy orders in STRIDE were mapped to RxNorm, a standardized nomenclature for clinical drugs produced by the National Library of Medicine. To streamline knowledge representation maintenance, the RxNorm concept and drug model was fully loaded into STRIDE. Being a concept-based terminology, RxNorm explicitly indicates semantic changes through changes in the concept unique identifiers – a practice which aids detecting change. RxNorm also maintains concept and atom histories, but the user is still responsible for detecting changes in semantic relationships and assessing the impact of change on their content. Semantic relationships themselves provide concepts definitions and can serve to validate content. Bodenreider7 et al. have used the semantic relationships in a graph-based approach to fully traverse the RxNorm model as a way of validating the exhaustiveness and correctness of the content.

This paper describes concept history tracking within RxNorm and its utilization within STRIDE. Our criteria for a successful RxNorm maintenance model were to:

  • Retain access to the original meaning of the mapped RxNorm concept.
  • Preserve navigation from branded drugs to their generic ingredients, when the ingredients were used for mapping.
  • Continue to locate the drug class of any generic ingredient concept in RxNorm by accessing its SNOMED CT term and navigating the SNOMED CT Pharmaceutical/Biologic Product hierarchy.


STRIDE receives data via HL7 messages from clinical information systems at the Stanford University Medical Center adult and pediatric hospitals and manages the data for clinical research use. Each hospital defines its own formulary and each uses a separate drug information vendor. The components of the pharmacy order, which STRIDE automatically identifies as a formulary concept per hospital, were algorithmically mapped to RxNorm concepts at the level of ingredient.8 Because the majority of drug orders contain generic drug names and the actual administration may or may not use branded drugs, we only assert generic ingredients.

The update task is dependent on the selected implementation of controlled medical vocabularies – in our case, on loading RxNorm fully, mapping to it from two local drug source terminologies within STRIDE, and using the RxNorm drug model to navigate from branded to generic drugs.

RxNorm codifies its drug model through categories, called “term types” (TTY), ingredient being one of them. RxNorm further explicates its drug representation through relationships between the normalized forms of clinical drug names, e.g. “tradename_of”.

We chose to load RxNorm fully, as both an informational and mapping resource, and use its inherent drug model and supplied synonymy to publicly available vocabularies. In RxNorm concepts and their atoms are added (e.g., when a vocabulary submits a term to RxNorm for a new drug on the market), changed (e.g., when a concept representation in the model is altered or a duplicate concept is identified), and retired (e.g., when all vocabularies stop submitting terms for a drug). RxNorm retains all concepts and atoms over time with indications to their state. However, concepts are also defined by their relationships. For example, the branded drug “Guaifed” (RXCUI=217413) has two clinically significant ingredients, “Guaifenesin” and “Phenylephrine”, indicated through “has_tradename” and reciprocal “tradename_of” relationships. The RxNorm model uses a built-in mechanism for tracking reformulations. For example, the replacement of pseudoephedrine with phenylephrine in the formulation of guaifed was communicated by linking “Guaifed” (RXCUI=217413) through the relationships of “reformulation_of” and “reformulated_to” to “Guaifed Reformulated Apr 2009” (RXCUI=847761), which retained relationships to its ingredients “Guaifenesin” and “Pseudoephedrine”. A new RXCUI was created for the old formula, while the original RXCUI persists over time with new relationships.

RxNorm contains semantic knowledge, but does not always retain it for retired concepts. In the case of “Vicodin”, due to a change in the RxNorm model to include strength in BNs, the “Vicodin” concept was suppressed and all its relationships to ingredients removed. The INs are now linked to the newly created BN RXCUIs: “Vicodin 7.5/750” (RXCUI = 856895), “Vicodin 5/500” (RXCUI=856904), and “Vicodin 10/660” (RXCUI=856909). STRIDE relies on the RxNorm drug model to identify BN and BPCK SY, containing our mapped IN combinations, for auto-complete suggestions in the interface. Given the BN examples above, it becomes incumbent on users to preserve the historic semantic relationships they use. However, this means that users need to supplement the current RxNorm model with historical data they derive from its earlier versions. It will be more useful to have an archive for relationships, such as the one for atoms and concepts, both for reporting changes in meaning and for corrections.

Maximizing the use of the concept and atom history already tracked by RxNorm allows us to streamline terminology maintenance within STRIDE and retain the original meaning of legacy pharmacy data. But to fully preserve the knowledge provided by the RxNorm drug model, it was also necessary to keep the relevant relationships, which formally define the mapped concept and are necessary for traversing the RxNorm drug model. Ideally, terminologies will carry their historical knowledge in a computable form, so that users would not need to create and maintain local copies, but would rely on APIs, such as RxNav9, for information access. A standard for representing change in terminologies, which clearly communicates changes both for concepts and their semantic relationships, will facilitate terminology implementation and reduce knowledge loss.


For the update task, using the NLM supplied data loading scripts, the November 2009 full version (11022009)10 and April 2009 full version (04062009)11 of RxNorm were loaded into separate schemas within the STRIDE Oracle 11G database and SQL queries were run to identify changes in the RxNorm content and model.

The concept comparison was approached as an iterative process with progressively more granular levels of detecting changes between the old and new versions of the terminology with a focus on concepts we mapped or used, i.e. INs, BNs, BPCKs and BPCK SYs. RxNorm organizes all terms contributed by source vocabularies and the terms it creates into concepts stored in the RXNCONSO table. RxNorm retains a concept unique identifier (RXCUI) history. No RXCUIs are deleted, but they are retired when all RxNorm source vocabularies cease to contribute terms; at that point, the SUPPRESS flag for the concept is turned on.

In order to ascertain continued access to the original meaning of any RxNorm concept used in mapping, first it is verified that there are no changes in the model by confirming that the term types in the RXNCONSO table have remained unchanged. Then it is verified that no concepts have disappeared, i.e. mapped RXCUIs are present in both versions, hence their meaning has remained unchanged. While we compare the concept properties RXCUI, TTY, SUPPRESS of RxNorm atoms between old and new version, we do so for informational reasons and retain all retired/suppressed concepts in order to interpret historical data.

RxNorm explicitly tracks semantic changes and other concept and atom corrections in the RXNATOMARCHIVE table; concept revisions are specifically indicated by the presence of RxNorm term types (IN, PIN, SCD, SCDC, SCDF, GPCK, BN, SBD, SBDC, SBDF, BPCK) and a difference in the values of RXCUI and MERGED_TO_RXCUI. The RXNATOMARCHIVE table includes a variety of merges: atom merges from one concept to another, resolution for atom or concept duplicates found during the resynchronization with the UMLS, and corrections for admitted errors in submissions from source vocabularies. RXNCUICHANGES table contains RXCUI changes only since the prior RxNorm release. Lastly, any RXCUIs still remaining and unaccounted for, are identified and marked for manual research, but the RxNorm built-in history tracking helps to account for all RXCUIs.

While RXCUI presence in the RXNCONSO tables of both RxNorm versions signifies that the concept remained unchanged, it is still necessary to verify the concept status and the presence of all its defining relationships in order to navigate from branded drugs to generic ingredients. The meaning of concepts narrower than IN, such as BNs and BPCKs, was reconstituted by assembling their component IN concepts. An unchanged array of INs for BNs and BPCKs between versions is an additional verification of their unchanged meaning.

Finally, we verify that synonyms from vocabularies of interest are retained and specifically that SNOMED CT terms, used to locate the drug in the SNOMED CT hierarchy and establish its drug class in the SNOMED CT Pharmaceutical/biological product hierarchy, remain. Any term created by a source vocabulary, but no longer contributed, is reclaimed for maintenance purposes by RxNorm and, as of September 2009, moved to the RXNCONSOOCD table with TTY changed to ‘OCD’. The identifying source attributes of the term remain accessible through the ATN and ATV properties in the RXNSATOCD table and can be used to research the term in the history of the contributing terminology.


The concept comparison results for RxNorm full versions released in April 2009 (04062009) versus November 2009 (11022009) are summarized in table 1 below.

Table 1.
RxNorm Version Comparison Results

The high numbers indicate the significant effort in refining normalized forms for existing concepts, especially as a transition for representing normalized forms of branded drugs seems to have occurred during the period of analysis. The counts for additions were generated solely on newly created RXCUIs between April and November 2009 regardless of their suppression state. Revision counts include the number of concepts, which were merged to other concepts in the April to November 2009 time window. Suppression counts include concepts which acquired the obsolete suppress flag in the time frame of interest, but exclude concepts for which suppression is used as an editing flag and terms are still contributed by other vocabularies.

We further focused our comparison only on concepts we mapped. Of the updates, concept and atom additions are most easily detected, as they require only comparisons of RXNCONSO records across versions. The majority of new INs was extracts. A case was identified, where a new concept was created for the new name of an existing concept: “1713:botulinum toxin type B” changed to “860178:rimabotulinumtoxinB”.

Term type classification changes, though infrequent, need to be considered separately from concept changes when the RxNorm model is used for navigating to drug components. Because we imported the RxNorm drug model and mapped at our desired level of granularity, it was necessary to verify the RxNorm model consistency from version to version. In September 2009, PIN was defined as a new term type for prior INs, which are precise ingredients, i.e. salt or isomer forms. This change clarifies the model, but does not change its structure, as PINs could be determined earlier by the existence of “has_form”/“form_of” relationships in RXNREL for their RXCUIs. The explicit term type simplified our comparisons for maintaining the appropriate level of granularity and recognizing content, which had been reclassified to have INs. In this respect, content-wise, only one mapping needed to be adjusted. The prior “Sevelamer carbonate” (RXCUI=660890) IN had been categorized as a salt and its IN refined to just “Sevelamer” (RXCUI=214824). The advantage of using RxNorm is the ability to rely on its content maintenance. In September 2009, OCD atoms were moved from RXNCONSO into RXNCONSOOCD, which provides greater clarity for their status.

The number of concept revisions at the level of granularity we use (IN, BN, and BPCK) is low, which eases our maintenance. Only 2 mapped INs were identified for RXCUI change: “Haemophilus influenzae b, capsular polysaccharide Meningococcal Protein Conjugate Vaccine” (RXCUI=798436) was merged to “Haemophilus influenzae b (Ross strain) capsular polysaccharide Meningococcal Protein Conjugate Vaccine” (RXCUI=798444), and “synthetic secretin” (RXCUI=353114) was replaced by “Secretin” (RXCUI=9627).

Concept retirement or suppression, in some cases, is accompanied by removal of defining relationships. While the RXCUIs for suppressed BNs were retained, their relationships were removed. In order to retain the model navigation for STRIDE, it was necessary to copy the relationships and apply timestamps of start and stop date for their validity.

We automatically picked up the updates for drug reformulations in transition. For example, in the April version, “Guaifed” (RXCUI=217413) had “has_tradename” relationships to 3 clinically significant ingredients, “Guaifenesin” and “Phenylephrine” and “Pseudoephedrine”, because the drug content was being processed for representing the reformulation. In the November RxNorm version, the old formula was correctly linked to “Guaifenesin” and “Pseudoephedrine” and new formula – linked to “Guaifenesin” and “Phenylephrine”. Indicating corrections for relationships will be very useful in screening out invalidated content. Also adding explicit time stamps to the whole set of initial relationships, which define the reformulation, will make the model navigation easier than parsing the string of the concept, “Guaifed Reformulated Apr 2009”. However, loading the RxNorm model has proven extremely useful in picking up updates and corrections for both concepts and relationships.

RxNorm, as an inter lingua, allowed us to leverage its mappings of RXCUIs to SNOMED CT concept_ids to utilize the SNOMED CT drug classes (product tree) and support concept grouping by class. This created the ability to automatically pick up one instance of a content correction. A medication term mapping for “IRON DEXTRAN COMPLEX (product)” (RXCUI=5992) was adjusted from the SNOMED CT Veterinary proprietary drug AND/OR biological (product) hierarchy to the Pharmaceutical / biologic product (product) hierarchy. The mappings, which RxNorm supports to other drug vocabularies – such as SNOMED CT, NDF-RT, Multum, First Databank, Micromedex, etc., can be used to leverage those additional knowledge bases, when appropriately licensed.


The maintenance evaluation confirmed the utility of the RxNorm built-in mechanism for tracking changes. Use of RxNorm confers the advantages of using a standard terminology which follows the desiderata12 for controlled medical vocabularies: it has rich and well-maintained content with defined concept orientation and assured concept permanence; it uses non-semantic concept identifiers, provides formal definitions through relationships, and aims at graceful evolution. However, users still need to ensure that changes in RxNorm content and its model do not lead to inconsistencies in their local representation.

The semantic relationships can be used to construct formal definitions of the concept for comparison between terminology versions. An emerging requirement is to extend the same principles of rigorous history tracking, applied to concepts, to the relationships as they effectively define the concepts in some contexts of use. Drug reformulations provide an example of the additional need for timestamps. RxNorm offers support for multiple levels of granularity. Eventually, when RxNorm is adopted as the drug data exchange standard, CDWs will receive SCD or SBD as source and land at a more specific level within RxNorm, but navigating the model for the timestamp of the order will always be a requirement for data interpretation. Without a change model, local and standard vocabulary divergence becomes unavoidable even when mapping at the coarsest level of granularity possible.

RxNorm offers rich up-to-date content and has a rigorous process of content curation, which allowed us to off-load content maintenance and restrict our focus on just maintaining the currency of our mapping. The reduced amount of work is a scalable long-term solution, which allows us to strike the appropriate trade-off between data interoperability and maintenance effort.


There are several advantages to adopting a standardized nomenclature, including reduced maintenance effort, the ability to easily support new data sources, and the increased clinical utility of a normalized drug model. The ability to cross check terminologies in local sources against a reference standard also assists with quality assurance in the CDW.

We continue to strive to capture the knowledge of all changes in a terminology, so that it can be automatically loaded into the CDW and have clinical data referring to the terminology retain their consistency. However, human review still remains an essential step in terminology maintenance. Ideally, standard terminologies would offer the ability to map at any appropriate level of granularity and facilitate use in applications, with the terminology including a coherent form of all the knowledge it has accumulated over time, thus permitting construction of a data view at a particular time point.

RxNorm offers rich content and an effective drug model. A more comprehensive representation of history with a formal change model that allows for computable history of all records will significantly ease integration and maintenance for systems that incorporate it.

Figure 1.
RxNorm model. Included are only the main term types: normalized forms of clinical drug names (SCD for generic and SBD for branded), all their constituent elements (IN, SCDC, SCDF, BN, SBDC, SBDF), and any applicable drug delivery devices (GPCK and BPCK). ...


1. Cimino JJ. Formal descriptions and adaptive mechanisms for change in controlled medical vocabularies. Methods of Information in Medicine. 1996;35:202–210. [PubMed]
2. Baorto D, Li L, Cimino JJ. Practical experience with the maintenance and auditing of a large medical ontology. Journal of Biomedical Informatics. 2009;42:494–503. [PMC free article] [PubMed]
3. Oliver DE, Shahar Y. Development of a change model for a controlled medical vocabulary. AMIA Annu Fall Symp. 1997:605–609. [PMC free article] [PubMed]
4. Bakhshi-Raiez F, Cornet R, de Keizer NF. Development and application of a framework for maintenance of medical terminological systems. J Am Med Inform Assoc. 2008 Sep–Oct;15(5):687–700. [PMC free article] [PubMed]
5. Lowe HJ, Ferris TA, Hernandez PM, Weber SC. STRIDE – An integrated standards-based translational research platform. AMIA Annu Symp Proc. 2009 2009 Nov 14;:391–5. [PMC free article] [PubMed]
6. Hartel FW, Fragoso G, Ong KL, Dionne R. Enhancing quality of retrieval through concept edit history. AMIA Annu Symp Proc. 2003:279–83. [PMC free article] [PubMed]
7. Bodenreider O, Peters LB. A graph-based approach to auditing RxNorm. Journal of Biomedical Informatics. 2009;42:558–570. [PMC free article] [PubMed]
8. Hernandez PN, Podchiyska T, Weber SC, Ferris TA, Lowe HJ. Automated mapping of pharmacy orders from two electronic health record systems to RxNorm within the STRIDE clinical data warehouse. AMIA Annu Symp Proc. 2009 2009 Nov 14;:244–8. [PMC free article] [PubMed]
9. Peters L, Bodenreider O. Using the RxNorm web services API for quality assurance purposes. AMIA Annu Symp Proc. 2008 2008 Nov 6;:591–5. [PMC free article] [PubMed]
12. Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods of Information in Medicine. 1998;37:394–403. [PMC free article] [PubMed]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association