Peter Byass and colleagues compared two methods of assessing data from verbal autopsies, review by physicians or probabilistic modeling, and show that probabilistic modeling is the most efficient means of analyzing these data
Cause of death data are an essential source for public health planning, but their availability and quality are lacking in many parts of the world. Interviewing family and friends after a death has occurred (a procedure known as verbal autopsy) provides a source of data where deaths otherwise go unregistered; but sound methods for interpreting and analysing the ensuing data are essential. Two main approaches are commonly used: either physicians review individual interview material to arrive at probable cause of death, or probabilistic models process the data into likely cause(s). Here we compare and contrast these approaches as applied to a series of 6,153 deaths which occurred in a rural South African population from 1992 to 2005. We do not attempt to validate either approach in absolute terms.
Methods and Findings
The InterVA probabilistic model was applied to a series of 6,153 deaths which had previously been reviewed by physicians. Physicians used a total of 250 cause-of-death codes, many of which occurred very rarely, while the model used 33. Cause-specific mortality fractions, overall and for population subgroups, were derived from the model's output, and the physician causes coded into comparable categories. The ten highest-ranking causes accounted for 83% and 88% of all deaths by physician interpretation and probabilistic modelling respectively, and eight of the highest ten causes were common to both approaches. Top-ranking causes of death were classified by population subgroup and period, as done previously for the physician-interpreted material. Uncertainty around the cause(s) of individual deaths was recognised as an important concept that should be reflected in overall analyses. One notably discrepant group involved pulmonary tuberculosis as a cause of death in adults aged over 65, and these cases are discussed in more detail, but the group only accounted for 3.5% of overall deaths.
There were no differences between physician interpretation and probabilistic modelling that might have led to substantially different public health policy conclusions at the population level. Physician interpretation was more nuanced than the model, for example in identifying cancers at particular sites, but did not capture the uncertainty associated with individual cases. Probabilistic modelling was substantially cheaper and faster, and completely internally consistent. Both approaches characterised the rise of HIV-related mortality in this population during the period observed, and reached similar findings on other major causes of mortality. For many purposes probabilistic modelling appears to be the best available means of moving from data on deaths to public health actions.
Please see later in the article for the Editors' Summary
Whenever someone dies in a developed country, the cause of death is determined by a doctor and entered into a “vital registration system,” a record of all the births and deaths in that country. Public-health officials and medical professionals use this detailed and complete information about causes of death to develop public-health programs and to monitor how these programs affect the nation's health. Unfortunately, in many developing countries dying people are not attended by doctors and vital registration systems are incomplete. In most African countries, for example, less than one-quarter of deaths are recorded in vital registration systems. One increasingly important way to improve knowledge about the patterns of death in developing countries is “verbal autopsy” (VA). Using a standard form, trained personnel ask relatives and caregivers about the symptoms that the deceased had before his/her death and about the circumstances surrounding the death. Physicians then review these forms and assign a specific cause of death from a shortened version of the International Classification of Diseases, a list of codes for hundreds of diseases.
Why Was This Study Done?
Physician review of VA forms is time-consuming and expensive. Consequently, computer-based, “probabilistic” models have been developed that process the VA data and provide a likely cause of death. These models are faster and cheaper than physician review of VAs and, because they do not rely on the views of local doctors about the likely causes of death, they are more internally consistent. But are physician review and probabilistic models equally sound ways of interpreting VA data? In this study, the researchers compare and contrast the interpretation of VA data by physician review and by a probabilistic model called the InterVA model by applying these two approaches to the deaths that occurred in Agincourt, a rural region of northeast South Africa, between 1992 and 2005. The Agincourt health and sociodemographic surveillance system is a member of the INDEPTH Network, a global network that is evaluating the health and demographic characteristics (for example, age, gender, and education) of populations in low- and middle-income countries over several years.
What Did the Researchers Do and Find?
The researchers applied the InterVA probabilistic model to 6,153 deaths that had been previously reviewed by physicians. They grouped the 250 cause-of-death codes used by the physicians into categories comparable with the 33 cause-of-death codes used by the InterVA model and derived cause-specific mortality fractions (the proportions of the population dying from specific causes) for the whole population and for subgroups (for example, deaths in different age groups and deaths occurring over specific periods of time) from the output of both approaches. The ten highest-ranking causes of death accounted for 83% and 88% of all deaths by physician interpretation and by probabilistic modelling, respectively. Eight of the most frequent causes of death—HIV, tuberculosis, chronic heart conditions, diarrhea, pneumonia/sepsis, transport-related accidents, homicides, and indeterminate—were common to both interpretation methods. Both methods coded about a third of all deaths as indeterminate, often because of incomplete VA data. Generally, there was close agreement between the methods for the five principal causes of death for each age group and for each period of time, although one notable discrepancy was pulmonary (lung) tuberculosis, which accounted for 6.4% and 21.3% of deaths in this age group, respectively, according to the physicians and to the model. However, these deaths accounted for only 3.5% of all the deaths.
What Do These Findings Mean?
These findings reveal no differences between the cause-specific mortality fractions determined from VA data by physician interpretation and by probabilistic modelling that might have led to substantially different public-health policy programmes being initiated in this population. Importantly, both approaches clearly chart the rise of HIV-related mortality in this South African population between 1992 and 2005 and reach similar findings on other major causes of mortality. The researchers note that, although preparing the amount of VA data considered here for entry into the probabilistic model took several days, the model itself runs very quickly and always gives consistent answers. Given these findings, the researchers conclude that in many settings probabilistic modeling represents the best means of moving from VA data to public-health actions.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000325.
The importance of accurate data on death is further discussed in a perspective previously published in PLoS Medicine Perspective by Colin Mathers and Ties Boerma
The World Health Organization (WHO) provides information on the vital registration of deaths and on the International Classification of Diseases; the WHO Health Metrics Network is a global collaboration focused on improving sources of vital statistics; and the WHO Global Health Observatory brings together core health statistics for WHO member states
The INDEPTH Network is a global collaboration that is collecting health statistics from developing countries; it provides more information about the Agincourt health and socio-demographic surveillance system and access to standard VA forms
Information on the Agincourt health and sociodemographic surveillance system is available on the University of Witwatersrand Web site
The InterVA Web site provides resources for interpreting verbal autopsy data and the Umeå Centre for Global Health Reseach, where the InterVA model was developed, is found at http://www.globalhealthresearch.net
A recent PLoS Medicine Essay by Peter Byass, lead author of this study, discusses The Unequal World of Health Data