Our study documented the validity of ICD-9-CM and ICD-10 coding systems in coding clinical information. We found that ICD-10 administrative data were coded reasonably well on 32 conditions but that some conditions tended to be underdetected in ICD-10 data and had low validity relative to chart review data. The validity of ICD-10 data was generally comparable with that of ICD-9-CM data in recording clinical information, although ICD-9-CM coding demonstrated better sensitivity for a few conditions.
We anticipated that the new coding system had the potential to produce better validity relative to ICD-9-CM due to the new structure of codes in ICD-10 that may enhance the accuracy and specificity of code identification. In this regard, ICD-10 partially reflects the advancement of medical knowledge of the past two decades. Yet, despite this potential for greater validity, our early validity assessment (performed 9 months after the implementation of ICD-10 coding) shows that sensitivity in ICD-10 was significantly lower than that in ICD-9-CM for myocardial infarction, hypertension, hypothyroidism, fluid and electrolyte disorders, obesity, drug abuse, and depression but higher in ICD-10 than in ICD-9-CM for dementia. The first possible explanation for the lower sensitivity in ICD-10 for several of the conditions is that coders were still in the early portion of an ICD-10 learning curve. The high sensitivity for dementia in ICD-10, meanwhile, may be related to the fact that ICD-10 groups dementias together as dementia in Alzheimer's disease (F00), vascular dementia (F01), dementia in other diseases classified elsewhere (F02), and unspecified dementia (F03). In contrast, ICD-9-CM does not group dementias together in the coding system as is done in ICD-10. The detailed grouping of “dementia” in ICD-10 may thus facilitate the work of coders in locating dementia codes, with the downstream result being an increase in the accuracy of coding. In contrast, there are no substantial enhancements in ICD-10 relative to ICD-9-CM in disease grouping and/or code descriptions for myocardial infarction and hypertension. For example, ICD-10 and ICD-9-CM were perfectly matched for hypertension codes I10.x/401.x-I15.x/405.x. The second possible explanation is that our coders who recoded charts in ICD-9-CM performed better than regular coders who coded ICD-10. About 16,000 charts were coded per year in Alberta. Coders rotate among hospital sites and are supervised under one manager within a health region. We recruited four coders who were working in the Health Records departments of the teaching hospitals studied and instructed them to code charts as they routinely do, following usual coding guidelines. Our coders coded 5.3 diagnoses per chart on average with median of four diagnoses in ICD-9-CM, which is very similar to the provincial average of 5.1 diagnoses per chart and median of four diagnoses in fiscal year 2001/2002 ICD-9-CM data. It therefore seems unlikely that the study coders performed better than regular coders. The third possible explanation is that our coders may have been randomly assigned to recode in ICD-9-CM some of the same charts that they had earlier coded in ICD-10 through their primary employment, thereby inflating the apparent similarity in performance between the two coding systems. While possible, we consider such a scenario to be infrequent, and also unlikely to have a major effect on the quality of our recoding. We randomly selected only 4,008 charts out of a total of about 70,000 (5.7 percent). Bearing in mind these numbers, it is quite unlikely for one of our coders to code the same randomly selected chart in the both ICD-9-CM and ICD-10. And even if this did occur on a few occasions, it would be quite difficult for a coder to remember much about the first time they coded a given chart. We therefore doubt that this scenario has occurred much and/or affected our results and conclusions significantly.
The introduction of the new coding system, ICD-10, raises new questions about the coding accuracy and completeness of clinical information recorded in administrative data and whether there have been changes in the magnitude of coders' errors between ICD-9-CM and ICD-10 coding systems. Anderson and Robenberg (2003)
analyzed cause of death before and after implementation of ICD-10 in the United States. They found that the ranking of leading causes of death was substantially changed due to changes in classification system from ICD-9 to ICD-10. For example, chronic liver disease and cirrhosis, the 10th cause of death under ICD-9, was dropped out from the top 10 list under ICD-10, and Alzheimer's disease became one of the top 10 causes of death in ICD-10. Janssen and Kunst (2004)
analyzed long-term cause-specific mortality in six European countries and noticed discontinuities in trends in cause-specific mortality due to changes in the coding system. Kokotailo and Hill (2005)
reviewed charts from ICD-9-CM and ICD-10 admission records to determine whether the ICD-10 coding system had potential improvements over ICD-9-CM for stroke and stroke risk factors. They found that stroke and stroke risk factors were coded equally well with ICD-9-CM and ICD-10. Further, the factors of atrial fibrillation, coronary artery disease/ischemic heart disease, diabetes mellitus, and hypertension were recorded significantly better than the factors of history of cerebrovascular disease, hyperlipidemia, renal failure, and tobacco use in both ICD-9-CM and ICD-10 databases. Henderson, Shepheard, and Sundararajan (2006)
compared routinely coded ICD-10 data with audit data from public hospitals in Australia and demonstrated that the transition of the coding from ICD-9-CM to ICD-10 did not noticeably affect the quality of administrative data. Our study of dually coded data thus adds to this growing body of literature on ICD-10 validity, and like previous studies suggests that ICD-10 data have generally comparable validity, but that they do not (at least yet) have better validity than do ICD-9-CM data.
A number of conditions had poor validity in both ICD-9-CM and ICD-10 administrative data. The poor coding of certain conditions such as weight lost, obesity, and certain anemia may relate to the fact that coders do not code these conditions even if they are documented in charts, because they may not be explicitly mentioned by nurses or physicians in clinical notes, and also because they may not affect length of stay, health care, or therapeutic treatment. Additionally, coders may intentionally not code these conditions due to the limited amount of time given to code each chart.
This study has limitations. A first limitation is that we reviewed charts only in teaching hospitals. We acknowledge that a study of nonteaching hospitals is also needed. Iezzoni et al. (1988
reported that the validity of administrative data varies between teaching and nonteaching hospitals. At nonteaching hospitals, acute clinical conditions tend to be more accurately documented but chronic coexisting diseases are less completely recorded than at teaching hospitals. A second limitation is that we employed chart data extracted by reviewers as a “reference standard” to assess the validity of ICD-9-CM and ICD-10 data. Such a criterion standard depends on the quality of charts and could only reflect part of the validity of administrative data. Ideally, a validity study should assess whether a condition that is truly present in a patient, and this depends on whether a condition is recorded correctly in the chart, and then subsequently coded precisely in the administrative data. Therefore, this study does not capture errors that could occur when clinicians take histories, make diagnoses, or record clinical information on charts (O'Malley et al. 2005
). A third limitation is that the validity of administrative data may vary across hospitals, across regions, and across countries. Therefore, our findings may not be applicable to other regions.
Weighing against these limitations are some notable study strengths. Our study is perhaps the first to undertake a direct comparison of ICD-9-CM versus ICD-10 in dually coded administrative data. We studied a large number of hospital discharge records and thus achieved good precision of our validity measures for many of the conditions studied. We also used new ICD-9-CM and ICD-10 coding algorithms (Quan et al. 2005
) to define conditions that are likely to optimize administrative data validity for capturing the clinical conditions.
In conclusion, our analysis of a unique dually coded database demonstrated that ICD-9-CM and ICD-10 administrative data were coded reasonably well and had similar validity in recording clinical condition information. The implementation of ICD-10 coding did not lead to an improvement in the coding of clinical conditions. However, we assessed hospital discharge data quality relatively early after implementation of ICD-10. The longer term impact of ICD-10 on data quality will need to be assessed in future studies.