|Home | About | Journals | Submit | Contact Us | Français|
The goal of this study was to assess the validity of the International Classification of Disease, 10th Version (ICD-10) administrative hospital discharge data and to determine whether there were improvements in the validity of coding for clinical conditions compared with ICD-9 Clinical Modification (ICD-9-CM) data.
We reviewed 4,008 randomly selected charts for patients admitted from January 1 to June 30, 2003 at four teaching hospitals in Alberta, Canada to determine the presence or absence of 32 clinical conditions and to assess the agreement between ICD-10 data and chart data. We then recoded the same charts using ICD-9-CM and determined the agreement between the ICD-9-CM data and chart data for recording those same conditions. The accuracy of ICD-10 data relative to chart data was compared with the accuracy of ICD-9-CM data relative to chart data.
Sensitivity values ranged from 9.3 to 83.1 percent for ICD-9-CM and from 12.7 to 80.8 percent for ICD-10 data. Positive predictive values ranged from 23.1 to 100 percent for ICD-9-CM and from 32.0 to 100 percent for ICD-10 data. Specificity and negative predictive values were consistently high for both ICD-9-CM and ICD-10 databases. Of the 32 conditions assessed, ICD-10 data had significantly higher sensitivity for one condition and lower sensitivity for seven conditions relative to ICD-9-CM data. The two databases had similar sensitivity values for the remaining 24 conditions.
The validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions was generally similar though validity differed between coding versions for some conditions. The implementation of ICD-10 coding has not significantly improved the quality of administrative data relative to ICD-9-CM. Future assessments like this one are needed because the validity of ICD-10 data may get better as coders gain experience with the new coding system.
The World Health Organization adopted the first version of the International Classification of Diseases (ICD) in 1900 to internationally monitor and compare mortality statistics and causes of death. Since then, the classification has been revised periodically to accommodate new knowledge of disease and health. The sixth revision, published in 1949, was more radical than the previous five revisions because this edition made it possible to record information from patient charts to compile morbidity statistics. Subsequent revisions were made in 1958 (7th Edition), in 1968 (8th Edition), and in 1979 (9th Edition). The United States modified ICD-9 by specifying many categories and extending coding rubrics to describe the clinical picture in more detail. These modifications resulted in the publication of ICD-9 Clinical Modification (ICD-9-CM) in 1979 for coding diagnoses in patient charts (Commission on Professional and Hospital Activities 1986). The latest version, ICD-10, was introduced in 1992 (World Health Organization 1992).
The major differences between the ICD-10 and ICD-9-CM coding systems are: (1) the tabular list in ICD-10 has 21 categories of disease compared with 19 categories in ICD-9-CM and the category of diseases of the nervous system and sense organs in ICD-9-CM is divided into three categories in ICD-10, including diseases of the nervous system, diseases of the eye and adnexa, and diseases of the ear and mastoid process; and (2) the codes in ICD-10 are alphanumeric while codes in ICD-9-CM are numeric. Each code in ICD-10 starts with a letter (i.e., A–Z), followed by two numeric digits, a decimal, and a digit (e.g., acute bronchiolitis due to respiratory syncytial virus is J21.0). In contrast, codes in ICD-9-CM begin with three digit numbers (i.e., 001–999), that are followed by a decimal and up to two digits (e.g., acute bronchiolitis due to respiratory syncytial virus is 466.11).
Canada, Australia, Germany, and other countries have enhanced ICD-10 by adding more specific codes and released country-specific ICD-10 versions, such as ICD-10-Canada (ICD-10-CA; Canadian Institute for Health Information 2003). However, ICD-10-CA has maintained its comparability with ICD-10. The basic ICD-10 structure, scope, content, and definition of existing codes are not altered in ICD-10-CA. This means that none of the ICD-10 codes are relocated or deleted. ICD-10-CA mainly extends code character levels, from third and fourth levels of ICD-10 to fourth, fifth, or sixth character levels (e.g., from I15.0 for renovascular hypertension to I15.00 for benign renovascular hypertension and I15.01 for malignant renovascular hypertension). A few additions of third- and fourth-level codes were also included in ICD-10-CA in a manner consistent with the existing classification. All of these additional codes are indicated with red maple leaf symbols in ICD-10-CA coding manuals.
To continuously study the health care system and investigate or monitor population health status with ICD-10 data, it is imperative to assess errors that could occur in the process of creating administrative data due to the introduction of the new coding system, ICD-10. We conducted this study to evaluate the validity of ICD-10 administrative hospital discharge data and to determine whether there were improvements in the validity compared with the validity of ICD-9-CM data. To achieve this aim, we reviewed randomly selected charts coded using ICD-10 at four Canadian teaching hospitals, determined the presence or absence of recorded conditions, and then separately recoded the same charts using ICD-9-CM. Then we assessed the agreement between originally coded ICD-10 administrative and chart review data, and the recoded ICD-9-CM administrative data and chart review data for recording the same conditions. This permitted us to compare the accuracy of ICD-10 data relative to the chart review data, with the accuracy of ICD-9-CM data relative to the chart review data for these conditions.
At each of the four adult teaching hospitals in Alberta, Canada, professionally trained health record coders read through the patients' medical charts to assign ICD-10-CA diagnoses that appropriately described the patient's hospitalization. Each discharge record contained a unique identification number for each admission, a patient chart number, and up to 16 diagnoses. Alberta hospital discharge records have been coded with ICD-10-CA since April 1, 2002. To avoid quality issues in coding during the transition period between ICD-9-CM and ICD-10-CA, we obtained all records for patients with ages ≥18 and discharged from January 1, 2003 through June 30, 2003 (i.e., 9 months after the implementation of ICD-10-CA) from the four study hospitals. After stratifying records by hospital, and assigning a random number to each record, we sorted them by ascendance of the random number and assigned a sequence number to each record within hospital. With the aim of having a final sample size of at least 1,000 records from each hospital, we located charts sequentially using a combination of patient chart number and admission identification number unique to admission at each hospital. We ended up reviewing 4,008 charts and did not locate 26 charts (i.e., a 99 percent success rate in locating charts).
Before April 1, 2002, discharge data were coded with ICD-9-CM and therefore, in our sampling period of January 1 to June 30, 2003, ICD-9-CM data were not available in Alberta. To create a new ICD-9-CM database, we attempted to simulate hospital coders' coding in ICD-9-CM (i.e., “real-world coding”). Four coders who had ICD-9-CM coding experience at these hospitals recoded the 4,008 charts following the ICD-9-CM coding guidelines used at the four hospitals at the average speed of coding staff, spending about 15–20 minutes per chart. These coders were blinded to the ICD-10-CA codes assigned to each record.
Through multiple steps, we developed ICD-10 coding algorithms and enhanced the Deyo and Elixhauser ICD-9-CM coding algorithms for adaptation of the Charlson and Elixhauser clinical conditions in ICD-9-CM and ICD-10 administrative data. Our multistep process for doing this is described in detail in a previously published paper (Quan et al. 2005). The ICD-10 coding algorithms used for this study did not contain country-specific ICD-10 codes. When the coding algorithms were used to define 32 conditions in ICD-9-CM and ICD-10-CA databases, respectively, using up to 16 diagnosis coding fields, we utilized the SAS functional command of “substr” to truncate the length of ICD-10 codes in the ICD-10-CA database. Therefore we defined the 32 conditions using the ICD-10 codes rather than ICD-10-CA codes and avoided influence of Canadian extended digits or additional codes on these conditions. This methodological approach is intentional, to increase the international relevance of our findings. We chose the Charlson index (Charlson et al. 1987) and Elixhauser measures (Elixhauser et al. 1998) because they have been widely used by health researchers to measure burden of disease or case mix with administrative data (Southern, Quan, and Ghali 2004; Sundararajan et al. 2004; Needham et al. 2005).
Two reviewers who have nursing backgrounds and health records coding training, as well as extensive chart review experience, reviewed the randomly selected charts to determine the presence or absence of 32 conditions. The chart reviewers followed the definitions described by Charlson et al. (1987) to determine the presence or absence of the 14 conditions that constitute the Charlson index. To determine the presence or absence of the remaining 18 Elixhauser clinical conditions in the charts, we developed explicit definitions by describing all of the ICD-10 codes that were used to define the 18 conditions, with the clinical terms used in the ICD-10 manuals.
Two reviewers underwent training in data extraction with the lead investigator (H. Q.). In the training session, the definition of study variables was discussed and eight charts were reviewed. Any discrepancies between the two reviewers in reviewing these eight charts were discussed and resolved by consensus involving a third party. The agreement between the two reviewers was then evaluated. Both of the reviewers independently extracted clinical conditions from 70 charts using a predesigned standard form from one of the teaching hospitals. Of the 32 conditions extracted from these 70 charts, 17 conditions had near perfect agreement (κ: 0.81–1.0), 10 had substantial agreement (κ: 0.61–0.80), and four had moderate agreement (κ: 0.41–0.60) according to Landis and Koch (1977) criteria. κ could not be calculated for the remaining one condition (i.e., psychosis) due to its low frequency in the sample. After the agreement study, two reviewers started chart reviews. In the period of data collection, they discussed cases with uncertainty in determining conditions to ensure the consistency between them.
The two reviewers examined the entire chart, including the cover page, discharge summaries, narrative summaries, pathology reports (including autopsy reports), trauma and resuscitation records, admission notes, consultation reports, surgery/operative reports, anesthesia reports, physician daily progress notes (nursing notes excluded), physician orders, diagnostic reports, and transfer notes for evidence of any of the 32 conditions. This detailed chart review process took approximately 1 hour per chart.
Aside from the difference in the average length of time per chart between reviewers (1 hour) and coders (15–20 minutes), reviewers focused on determining presence or absence of medical conditions based on all documented information in the chart, including diagnostic imaging and laboratory results. This is in contrast to general coding guidelines (Canadian Institute of Health Information 2007) that instruct coders to confine their coding to clinical problems, conditions, or circumstances that are identified in the record by the treating physicians as the clinically significant reason for the patient's admission, or that require or influence evaluation, treatment, management, or care. Coders do not typically code problems that do not meet these requirements, whereas the reviewers who conducted our “reference standard” chart review included them regardless of the significance of the condition on resource use during hospitalization. Coders are instructed that when a condition is suggested by diagnostic test results, they should only code the condition if it has been confirmed by physician documentation.
Three databases were thus created for the same hospital discharges: (1) ICD-10 discharge abstract data, (2) ICD-9-CM discharge abstract data, and (3) chart review data. The databases allowed us to calculate sensitivity, specificity, positive predictive value, and negative predictive value for each condition recorded in ICD-10 hospital discharge data and then in ICD-9-CM discharge data, accepting the chart review data as a “reference standard.” Recognizing that some might question the use of chart review data as a reference standard, the κ statistic was also used to assess the agreement between the two databases for individual conditions. For each condition identified in the chart data, McNemar's test was used to compare the sensitivity and specificity of ICD-10 versus ICD-9-CM data relative to the chart review data for detecting the conditions. To implement McNemar's statistical test for estimates of sensitivity and specificity, records with and then without a given condition present, respectively, based on chart data, were selected and agreement between ICD-9-CM and ICD-10 was tested in the subsample.
Table 1 presents the frequency of the 32 conditions by data source among 4,008 records. Compared with the chart review data, the ICD-9-CM data underreported 29 conditions, slightly overreported two conditions (diabetes with complications and renal failure), and equivalently reported one condition (deficiency anemia). The ICD-10 data underreported 31 conditions and slightly over-reported one condition (renal failure). ICD-10 data had a significantly lower frequency for eight conditions and higher frequency for three conditions compared with ICD-9-CM data.
Table 2 presents five quantitative indices to assess whether the administrative data accurately reproduced what was recorded in the patient charts by data source. Sensitivity was calculated to measure the extent of recording the presence of conditions in administrative data when these were present in the chart review data. Sensitivity for ICD-9-CM and ICD-10 data varied greatly by condition. Metastatic cancer had the highest sensitivity (83.1 percent in ICD-9-CM and 80.8 percent in ICD-10) and weight loss had the lowest sensitivity (9.3 percent in ICD-9-CM and 12.7 percent in ICD-10). Compared with ICD-10 data, ICD-9-CM data had significantly higher sensitivity for seven conditions and lower sensitivity for one condition. Sensitivity for the remaining 24 conditions was similar between ICD-9-CM and ICD-10 (see Table 2 and Figure 1). Positive predictive value, which determines the extent to which a condition present in the administrative data was also present in the chart review data, was higher than 75 percent for 20 conditions in ICD-9-CM and for 18 conditions in ICD-10 data. Specificity was used to determine the extent of reporting absence of these conditions in the administrative data when these diseases were absent in the charts. Negative predictive value was also used to determine the extent to which a condition absent in the administrative data was truly absent according to the chart review data. Specificity was higher than 98 percent for 29 conditions in ICD-9-CM (96.5 percent for solid tumor without metastasis, 97.7 percent for drug abuse, and 94.4 percent for depression) and for all 32 conditions in ICD-10. Negative predictive value was higher than 98 percent for 12 conditions in ICD-9-CM and 13 conditions in ICD-10. Cardiac arrhythmias had the lowest negative predictive value in both datasets (85.8 percent in ICD-9-CM and 85.3 percent in ICD-10).
The κ value indicates that a near perfect agreement (κ: 0.81–1.0 between coded data and chart review data) was found for two conditions in ICD-9-CM and one in ICD-10 data, substantial agreement (κ: 0.61–0.80) for 13 conditions in ICD-9-CM and 11 conditions in ICD-10, moderate agreement (κ: 0.41–0.60) for 10 conditions in ICD-9-CM and 15 conditions in ICD-10 and fair agreement (κ: 0.21–0.40) for six conditions in ICD-9-CM and five conditions in ICD-10. κ values relative to chart review data were generally similar for the ICD-9-CM and ICD-10 data for 29 conditions, but were discrepant for HIV/AIDS, hypothyroidism, and dementia (see Table 2 and Figure 2).
Our study documented the validity of ICD-9-CM and ICD-10 coding systems in coding clinical information. We found that ICD-10 administrative data were coded reasonably well on 32 conditions but that some conditions tended to be underdetected in ICD-10 data and had low validity relative to chart review data. The validity of ICD-10 data was generally comparable with that of ICD-9-CM data in recording clinical information, although ICD-9-CM coding demonstrated better sensitivity for a few conditions.
We anticipated that the new coding system had the potential to produce better validity relative to ICD-9-CM due to the new structure of codes in ICD-10 that may enhance the accuracy and specificity of code identification. In this regard, ICD-10 partially reflects the advancement of medical knowledge of the past two decades. Yet, despite this potential for greater validity, our early validity assessment (performed 9 months after the implementation of ICD-10 coding) shows that sensitivity in ICD-10 was significantly lower than that in ICD-9-CM for myocardial infarction, hypertension, hypothyroidism, fluid and electrolyte disorders, obesity, drug abuse, and depression but higher in ICD-10 than in ICD-9-CM for dementia. The first possible explanation for the lower sensitivity in ICD-10 for several of the conditions is that coders were still in the early portion of an ICD-10 learning curve. The high sensitivity for dementia in ICD-10, meanwhile, may be related to the fact that ICD-10 groups dementias together as dementia in Alzheimer's disease (F00), vascular dementia (F01), dementia in other diseases classified elsewhere (F02), and unspecified dementia (F03). In contrast, ICD-9-CM does not group dementias together in the coding system as is done in ICD-10. The detailed grouping of “dementia” in ICD-10 may thus facilitate the work of coders in locating dementia codes, with the downstream result being an increase in the accuracy of coding. In contrast, there are no substantial enhancements in ICD-10 relative to ICD-9-CM in disease grouping and/or code descriptions for myocardial infarction and hypertension. For example, ICD-10 and ICD-9-CM were perfectly matched for hypertension codes I10.x/401.x-I15.x/405.x. The second possible explanation is that our coders who recoded charts in ICD-9-CM performed better than regular coders who coded ICD-10. About 16,000 charts were coded per year in Alberta. Coders rotate among hospital sites and are supervised under one manager within a health region. We recruited four coders who were working in the Health Records departments of the teaching hospitals studied and instructed them to code charts as they routinely do, following usual coding guidelines. Our coders coded 5.3 diagnoses per chart on average with median of four diagnoses in ICD-9-CM, which is very similar to the provincial average of 5.1 diagnoses per chart and median of four diagnoses in fiscal year 2001/2002 ICD-9-CM data. It therefore seems unlikely that the study coders performed better than regular coders. The third possible explanation is that our coders may have been randomly assigned to recode in ICD-9-CM some of the same charts that they had earlier coded in ICD-10 through their primary employment, thereby inflating the apparent similarity in performance between the two coding systems. While possible, we consider such a scenario to be infrequent, and also unlikely to have a major effect on the quality of our recoding. We randomly selected only 4,008 charts out of a total of about 70,000 (5.7 percent). Bearing in mind these numbers, it is quite unlikely for one of our coders to code the same randomly selected chart in the both ICD-9-CM and ICD-10. And even if this did occur on a few occasions, it would be quite difficult for a coder to remember much about the first time they coded a given chart. We therefore doubt that this scenario has occurred much and/or affected our results and conclusions significantly.
ICD-9-CM administrative data have been validated using various methodologies for various purposes. Hsia et al. (1992) assessed the accuracy of claims data by measuring incorrect grouping of clinically interrelated diagnostic codes with diagnosis-related groups (DRGs) and found that incorrect assignment of DRGs decreased significantly from 21 percent in 1985 to 15 percent in 1988. Many other investigators (Iezzoni et al. 1988; Jollis et al. 1993; Romano and Mark 1994; Geraci et al. 1997; Muhajarine et al. 1997; Weingart et al. 2000; Best et al. 2002; Quan, Parson, and Ghali 2002; Romano et al. 2002; Lee et al. 2005; Yasmeen et al. 2006) conducted validation studies focusing on comorbidities, clinical conditions, and complications of substandard care, and found that administrative data are accurately coded for many severe or life-threatening conditions such as myocardial infarction and cancer, but that some clinically nonspecific and symptomatic conditions such as rheumatologic disease, are less accurately coded.
The introduction of the new coding system, ICD-10, raises new questions about the coding accuracy and completeness of clinical information recorded in administrative data and whether there have been changes in the magnitude of coders' errors between ICD-9-CM and ICD-10 coding systems. Anderson and Robenberg (2003) analyzed cause of death before and after implementation of ICD-10 in the United States. They found that the ranking of leading causes of death was substantially changed due to changes in classification system from ICD-9 to ICD-10. For example, chronic liver disease and cirrhosis, the 10th cause of death under ICD-9, was dropped out from the top 10 list under ICD-10, and Alzheimer's disease became one of the top 10 causes of death in ICD-10. Janssen and Kunst (2004) analyzed long-term cause-specific mortality in six European countries and noticed discontinuities in trends in cause-specific mortality due to changes in the coding system. Kokotailo and Hill (2005) reviewed charts from ICD-9-CM and ICD-10 admission records to determine whether the ICD-10 coding system had potential improvements over ICD-9-CM for stroke and stroke risk factors. They found that stroke and stroke risk factors were coded equally well with ICD-9-CM and ICD-10. Further, the factors of atrial fibrillation, coronary artery disease/ischemic heart disease, diabetes mellitus, and hypertension were recorded significantly better than the factors of history of cerebrovascular disease, hyperlipidemia, renal failure, and tobacco use in both ICD-9-CM and ICD-10 databases. Henderson, Shepheard, and Sundararajan (2006) compared routinely coded ICD-10 data with audit data from public hospitals in Australia and demonstrated that the transition of the coding from ICD-9-CM to ICD-10 did not noticeably affect the quality of administrative data. Our study of dually coded data thus adds to this growing body of literature on ICD-10 validity, and like previous studies suggests that ICD-10 data have generally comparable validity, but that they do not (at least yet) have better validity than do ICD-9-CM data.
A number of conditions had poor validity in both ICD-9-CM and ICD-10 administrative data. The poor coding of certain conditions such as weight lost, obesity, and certain anemia may relate to the fact that coders do not code these conditions even if they are documented in charts, because they may not be explicitly mentioned by nurses or physicians in clinical notes, and also because they may not affect length of stay, health care, or therapeutic treatment. Additionally, coders may intentionally not code these conditions due to the limited amount of time given to code each chart.
This study has limitations. A first limitation is that we reviewed charts only in teaching hospitals. We acknowledge that a study of nonteaching hospitals is also needed. Iezzoni et al. (1988, 1990) reported that the validity of administrative data varies between teaching and nonteaching hospitals. At nonteaching hospitals, acute clinical conditions tend to be more accurately documented but chronic coexisting diseases are less completely recorded than at teaching hospitals. A second limitation is that we employed chart data extracted by reviewers as a “reference standard” to assess the validity of ICD-9-CM and ICD-10 data. Such a criterion standard depends on the quality of charts and could only reflect part of the validity of administrative data. Ideally, a validity study should assess whether a condition that is truly present in a patient, and this depends on whether a condition is recorded correctly in the chart, and then subsequently coded precisely in the administrative data. Therefore, this study does not capture errors that could occur when clinicians take histories, make diagnoses, or record clinical information on charts (O'Malley et al. 2005). A third limitation is that the validity of administrative data may vary across hospitals, across regions, and across countries. Therefore, our findings may not be applicable to other regions.
Weighing against these limitations are some notable study strengths. Our study is perhaps the first to undertake a direct comparison of ICD-9-CM versus ICD-10 in dually coded administrative data. We studied a large number of hospital discharge records and thus achieved good precision of our validity measures for many of the conditions studied. We also used new ICD-9-CM and ICD-10 coding algorithms (Quan et al. 2005) to define conditions that are likely to optimize administrative data validity for capturing the clinical conditions.
In conclusion, our analysis of a unique dually coded database demonstrated that ICD-9-CM and ICD-10 administrative data were coded reasonably well and had similar validity in recording clinical condition information. The implementation of ICD-10 coding did not lead to an improvement in the coding of clinical conditions. However, we assessed hospital discharge data quality relatively early after implementation of ICD-10. The longer term impact of ICD-10 on data quality will need to be assessed in future studies.
This study was supported by an operating grant from the Canadian Institutes of Health Research, Canada. Dr. Quan is supported by a Population Health Investigator Award from the Alberta Heritage Foundation for Medical Research, Edmonton, Alberta, Canada and by a New Investigator Award from the Canadian Institutes of Health Research. Dr. Ghali is supported by a Senior Health Scholar Award from the Alberta Heritage Foundation for Medical Research, Alberta, Canada, and by a Government of Canada Research Chair in Health Services Research. The authors thank 3M for providing 3M™ Codefinder™ ICD-9-CM code searching software.
IMECCHI (International Methodology Consortium for Coded Health Information) investigators include Bernard Burnand, University of Lausanne, Switzerland; Cyrille Colin, University of Lyon, France; Chantal Couris, University of Lyon, France; Carolyn De Coster, University of Manitoba, Canada; Saskia Drössler, Niederrhein University of Applied Sciences, Germany; Alan Finlayson, the National Health Service in Scotland, U.K.; Kiyohide Fushimi, Tokyo Medical and Dental University Graduate School, Japan; Min Gao, British Columbia Provincial Public Health Services Authority, Canada; William Ghali, University of Calgary, Canada; Patricia Halfon, University of Lausanne, Switzerland; Brenda Hemmelgarn, University of Calgary, Canada; Karin Humphries, University of British Columbia, Canada; Jean-Marie Januel, University of Lausanne, Switzerland; Helen Johansen, Statistics Canada; Lisa Lix, Universality of Manitoba, Canada; Jean-Christophe Luthi, University of Lausanne, Switzerland; Jin Ma, Jiaotong University, China; Hude Quan, University of Calgary, Canada; Patrick Romano, University of California at Davis, U.S.A.; Leslie Roos, University of Manitoba, Canada; Fiona Shrive, University of Calgary, Canada; Vijaya Sundararajan, Victorian Department of Human Services, Australia; Jack Tu, University of Toronto, Canada; Sandrine Touzet, University of Lyon, France; and Greg Webster, Canadian Institute of Health Information, Canada.
Disclosures: No any conflicts of interest.