Timely, valid, and reliable information on causes of death by age and sex is a critical input into public health planning, program implementation, and evaluation. Most high-income and many middle-income countries have the benefit of a complete vital registration system in which the vast majority of deaths get a certificate of death completed by a physician [1
]. These information systems should in principle provide public health communities in each country with critical information needed to guide their programs. Nevertheless, analyzing levels and trends in causes of death, even in countries with well-functioning cause-of-death registration systems, remains challenging for a number of reasons related to the process of completing death certificates and the coding of each death certificate following standardized international rules.
Even with a physician-completed death certificate, assignment of the underlying cause of death can be problematic. In the Second Annual Report of the Registrar General of Great Britain in 1840, William Farr presented the statistics of causes of death (CoD), defined as "diseases, which terminate in the extinction of existence," but Farr highlighted the concern that "...the attention of the observer was less attracted to this class of facts, and overlooking the proximate cause, that is, the internal morbid process..." In that report, he also criticized the use of vague categories like "sudden death," "natural death," "visitation of God," and "old age," but he admitted that in some cases, no particular cause of death could be identified [2
]. All these criticisms remain relevant today.
Analysis of cause-of-death data is intimately linked to the evolution of the International Statistical Classification of Diseases and Related Health Problems
(ICD). Originally known as the International List of Causes of Death
, the modern era for the ICD began when the World Health Assembly approved the sixth revision of the ICD in 1948 [3
]. The new classification sought to establish an international standard for terminology and nosological criteria to attribute disease names and classify pathologies. Adoption of the ICD by the World Health Organization (WHO) also included a commitment by Member States of WHO to report national statistics based on the ICD. ICD-6 also included the adoption of an international medical certificate of CoD, an international agreement about the underlying cause of death (UCD) as the main cause to be tabulated and the rules for selecting UCD.
Despite the adoption of an international death certificate, the principle of identifying the UCD, and a standard list of causes codified in the revisions of the ICD, at least three problems create issues of comparability for public health analysis among participating countries. First, each time there is a change in the ICD, the set of causes and the codes assigned to each underlying cause change substantially. Producing time series of cause-of-death data requires mapping for some coherent set of causes across revisions - a practice often known as bridge coding [4
]. For example, to produce a time series spanning the 20th
century, one would need to map across the International List of Causes of Death
(ILCD 1-5) to the International Statistical Classification of Diseases and Related Health Problems
(ICD 6-10). Whereas the ILCD had only been used to classify mortality, the ICD expanded to include both mortality and morbidity, thus increasing the number of causes from 179 to 20,000 [6
]. Time series analyses [7
] for selected causes have attempted to map national ICD revisions over time, but idiosyncratic national use of the ICD has limited more general approaches to bridge coding that are applicable across all countries. In addition, in the WHO database documentation [10
], there is no mention of the ICD sixth revision, but during the period 1949-1957, at least 40 countries used this version and sent data to the Pan American Health Organization(PAHO) and WHO.
Second, due to the increase in the number of causes, tabulation lists were introduced starting with ICD-6. These lists provide a much shorter set of aggregate codes intended to facilitate cause-of-death reporting in countries with more limited capacity and for communication purposes. A substantial component of historical vital registration data is only available for these tabulation lists, including ICD-7 Tabulation A and B, ICD-8 Tabulation A and B, Basic Tabulation List (BTL) in ICD-9, and mortality tabulation in ICD-10. As with any aggregation procedure, substantial information is lost as compared to the fully disaggregated ICD data that were used to create these lists. For some causes, such as cardiomyopathy, pericarditis, endocarditis, and myocarditis (in BTL and ICD-7 Tab A), or source of burning and exposure to inanimate or mechanical forces in ICD-10 Tabulation list 1, assessing time trends requires some way of breaking down the tabulated data into component causes.
Third, with the advent of the sixth revision, the ICD has been used not only to code deaths by underlying cause of death but also to code other types of medical information, such as reasons for admission to or discharge from a hospital. The introduction of multiple purposes for the ICD has lead to the addition of many codes for causes that should not be considered underlying causes of death. WHO has recognized this problem by producing lists of ICD codes under the heading "List of conditions unlikely to cause death" in the appendix of Volume 2 of the second edition of the ICD [3
]. Despite these recommendations from WHO, these codes are frequently used as underlying causes of death. More generally, some ICD codes are used to assign cause of death that are likely misclassifications from a public health perspective.
In 1996, Murray and Lopez [11
] introduced the term "garbage coding" for the practice of assigning deaths to causes that are not useful for public health analysis of cause-of-death data as part of the assessment of the Global Burden of Disease (GBD). While some practitioners may object to the term "garbage code" as pejorative, alternative terms have not yet caught on in the literature. We follow this practice and use the term garbage code (GC) to refer to all deaths assigned to codes that should be redistributed to enhance the validity of public health analysis. The variable use of GCs across countries and over time profoundly limits meaningful comparisons of causes of death; for this reason, WHO and other analysts have sought to reassign deaths coded to GCs to other causes following various methods [11
Given the importance of cause-of-death data for public health analysis, we attempted in this paper to build on prior cause-of-death analysis work [1
] and to create a more detailed approach to these problems of comparability of ICD-coded cause-of-death data. Our goal was to maximize the public health utility of cause-of-death data. To achieve this, we created a public health cause-of-death list building on the Global Burden of Disease Study, mapped this cause list across ICD revisions, and provided a comprehensive framework for identifying and redistributing deaths assigned to GCs. We illustrated this approach using an extensive database of publicly available cause-of-death data for more than 100 countries spanning 1950 to 2008.