Search tips
Search criteria 


Logo of wtpaEurope PMCEurope PMC Funders GroupSubmit a Manuscript
Scand J Public Health. Author manuscript; available in PMC 2010 March 8.
Published in final edited form as:
PMCID: PMC2833983

Refining a probabilistic model for interpreting verbal autopsy data



To build on the previously reported development of a Bayesian probabilistic model for interpreting verbal autopsy (VA) data, attempting to improve the model's performance in determining cause of death and to reassess it.


An expert group of clinicians, coming from a wide range geographically and in terms of specialisation, was convened. Over a 4-day period the content of the previous probabilistic model was reviewed in detail and adjusted as necessary to reflect the group consensus. The revised model was tested with the same 189 VA cases from Vietnam, assessed by two local clinicians, that were used to test the preliminary model.


The revised model contained a total of 104 indicators that could be derived from VA data and 34 possible causes of death. When applied to the 189 Vietnamese cases, 142 (75.1%) achieved concordance between the model's output and the previous clinical consensus. The remaining 47 cases (24.9%) were presented to a further independent clinician for reassessment. As a result, consensus between clinical reassessment and the model's output was achieved in 28 cases (14.8%); clinical reassessment and the original clinical opinion agreed in 8 cases (4.2%), and in the remaining 11 cases (5.8%) clinical reassessment, the model and the original clinical opinion all differed. Thus overall the model was considered to have performed well in 170 cases (89.9%).


This approach to interpreting VA data continues to show promise. The next steps will be to evaluate it against other sources of VA data. The expert group approach to determining the required probability base seems to have been a productive one in improving the performance of the model.


Verbal autopsy (VA) is the process of eliciting information about the circumstances of a death from family or friends of the recently deceased person in cases where medical certification of death is incomplete or absent [1-4]. It is a useful surrogate for routine death registration in resource-poor settings and has been used to estimate cause-specific mortality [5]. Physician review of VA data, whereby data are assessed by one or more physicians who assign probable cause of death, has been shown to be a reliable method for VA interpretation [2]. However, issues regarding standardisation between different physicians and the risk of having to rely on physicians over time hinder reliable temporal and regional comparisons of mortality [6]. In addition, the time that physicians must devote to assessing large numbers of VAs is far from ideal in areas with insufficient medical personnel. Algorithms have the potential to address these concerns [7] but raise others, such as reliability and the difficulty of considering parallel possibilities along the lines of classic clinical differential diagnoses.

A preliminary model for VA interpretation based on Bayes' theorem was developed in an attempt to overcome the weaknesses of physician review and algorithmic approaches. Bayes' theorem seeks to define the probability of a cause (C) given the presence of a particular indicator (I), and can be represented as:


where P(!C) is the probability of not (C).

The probability of occurrence of each indicator (I1…In) and each possible cause of death (C1…Cm) can be determined at the population level, which in this case means among all deaths. Thus, for a particular case, the probability of Ck is initially the value found among all deaths in general. However, for each case and each applicable indicator, the probability of Ck can be modified by the above theorem. The VA interpretation model adjusts the probability of each possible cause according to a matrix of P((I1…In)/(C1…Cm)) and lists up to three likely causes. In the preliminary model the set of indicators and causes was influenced by Indepth's proposed VA questionnaire [8], and the associated probabilities were estimates based on accumulated personal experience.

Initial validation of the preliminary model was carried out on a set of 189 VAs from rural Vietnam, which had previously been assessed by two physicians leading to a consensus on a single cause of death for each case. Over 70% of individual causes of death corresponded with those determined by the physicians, increasing to over 80% when cases ascribed to ‘old age’ or ‘indeterminate’ by the physicians were excluded. A more detailed background to the preliminary model and its initial validation are described elsewhere [6].

Following validation of the preliminary model it was deemed appropriate to refine the probabilities used in the model and address underlying conceptual issues of VA data collection and interpretation. This paper describes development of this probabilistic approach to VA interpretation using an expert Delphi technique. Validation of the updated and refined model on the same 189 cases from Vietnam and a comparison between the performance of the preliminary and updated models is also described.


The Delphi technique is an approach used to gain consensus among a panel of experts in order to address a lack of agreement or incomplete state of knowledge [9, 10]. The technique was adopted here to develop consensus on probabilities of different causes of death occurring at the population level and probabilities of specific signs and symptoms presenting themselves at the population level and in specific causes of death. The technique was also utilised to develop consensus on key conceptual issues of cause of death classification and VA usage.

An expert group convened over four consecutive days. The group was comprised of five physicians (YB, TC, KK, LM, DDV) with extensive clinical experience in resource-poor settings. They represented a range of important disciplines of medicine: surgery; maternal and reproductive health; paediatrics; and internal medicine. The experts came from a range of settings in developing or transitional countries where routine death registration is often absent (South Africa, Ethiopia, The Gambia and Vietnam). It was felt that the range of backgrounds and geographical spread of the expert group would lead to a generalised consensus not specific to any one region or medical discipline. Each member of the expert group was either experienced in or very familiar with the process, importance and limitations of VAs and all were briefed on the probabilistic approach to VA interpretation.

The researchers facilitated discussions in which the experts were requested to consider the inclusion of indicators and causes of death in the model, bearing in mind that friends or relatives of the deceased person must be able to notice and report indicators to the lay fieldworkers [5]. A list of 34 possible causes of death and 104 indicators was developed (Table 1). Probabilities were agreed upon and assigned to each indicator and cause of death at the population level and for each specific cause of death using on a semi-qualitative scale following work by Kong et al (1986) [11] (Table 2). A higher degree of precision was not sought since previous work suggests that this is not essential in order to build a workable model [12].

Table 1
Verbal autopsy indicators and causes of death used in the refined model.
Table 2
The semi-qualitative scale used for assigning probabilities of indicators and causes in the refined model.

There was strong consensus among the physicians that probabilities of causes of death with large variations in prevalence at the population level between regions, such as HIV/AIDS and malaria, should have the possibility of being adjusted in the model to reflect the local burden of these diseases at the population level. To warrant adjustment of the database it was felt that regional variations of disease prevalence should be at least ten-fold. It was not felt necessary to adjust the database to reflect regional variations in causes of death with very specific indicators, such as meningitis or transport accidents. The revised model therefore included a facility to reflect either high or low prevalence for HIV/AIDS and malaria.

The model was updated using Visual FoxPro database software to make adjustments to probabilities and removal or insertion of various causes and indicators. The revised model's output was modified to only show more than one cause of death if the probability of the additional cause(s) was within 20% of the most likely cause. This is in contrast to the preliminary model, which always gave the three most likely causes irrespective of probabilities. The model was also adjusted so that certain causes of death were extremely unlikely to be diagnosed without the presence of specific indicators. For example, it is thus highly unlikely that the model will conclude that death resulted from diarrhoeal disease without the symptom of diarrhoea being reported. Each member of the expert group was provided with a working prototype of the model and given the opportunity to test it on hypothetical cases to highlight any inconsistencies and anomalies.

The updated probabilistic model was applied to the VA data from the same 189 Vietnamese cases used to validate the preliminary model. Indicators were gathered from the original VA questionnaires and included open-ended, free- text information. These data and the underlying VA process used in Vietnam are described in detail elsewhere [3]. Comparisons were made with the cause of death as previously agreed by the two local physicians in Vietnam and with the results from the preliminary model.

Many studies aiming to validate VA interpretation methodologies against hospital records or physician review describe sensitivities, specificities and positive predictive values (PPV)[1, 13, 14]. However, the calculation of such statistics assumes that the referent diagnosis gives the right answer and is an absolute gold standard. This assumption is flawed due to the inconsistencies of physician review and studies describing sensitivity, specificity and PPV of VA methods often discuss the possibility that in certain cases the VA diagnosis may be more accurate than the diagnosis provided by physician review or hospital records [13, 14]. As such, it was considered inappropriate to calculate the sensitivities, specificities or PPV for the probabilistic model in this validation study. Instead, kappa (κ) values are calculated since they simply reflect the level of agreement between the two methods and do not imply superiority of one method over the other.


In 142/189 cases (75.1%) the cause of death as determined by the refined model agreed with the consensus of the two original assessing physicians (κ (95%CI) = 0.50 (0.42-0.59)). In a number of the indeterminate and contradictory cases it was not always clear why the physicians' conclusion was more appropriate than that of the model. Therefore, the remaining 47 cases (24.9%) were presented to a further experienced clinician, who was neither involved in the original assessment nor in the model's development, for reassessment. As a result, 28 cases (59.6%) arrived at consensus between clinical reassessment and the model's output (κ (95% CI) = 0.8 (0.74–0.86)); 8 cases (4.2%) arrived at consensus between clinical reassessment and the original clinical opinion (κ (95%CI) = 0.66 (0.51-0.81)), and in the remaining 11 cases (5.8%) clinical reassessment, the model and the original clinical opinion all differed. Thus overall the model was considered to have performed well in 170 cases (89.9%). This shows a substantial improvement compared with the preliminary model where 134/189 cases (70.9%) were in agreement (κ (95% CI) = 0.42 (0.33–0.51)), 34 cases (18.0%) were indeterminate, and 21 cases (11.1%) were contradictory.

Figure 1 shows cause specific mortality fractions for the 189 deaths separately derived from the most likely causes from the refined probabilistic model, weighted multiple causes from the refined model (e.g. assigning 1/3 of a death to each of three likely causes), the original physicians' verdict, the physicians verdict following reassessment of 47 cases by a third independent physician, and the most likely cause from the preliminary, unrefined model.

Figure 1
Cause-specific mortality fractions for major causes of death, derived from 189 verbal autopsies in Vietnam, according to the most likely causes from the refined probabilistic model, weighted multiple causes from the refined model, the original physicians' ...


The results from comparisons of the model's output with that of the physicians is very encouraging given that development of the model and the underlying probabilities were not specifically linked to the Vietnamese VA process or setting. The improved performance of the refined model when compared with the preliminary model illustrates the effectiveness of the Delphi approach utilised.

The development of this approach to VA interpretation has highlighted many of the unanswered questions around the whole process of VA data collection and interpretation. For example, not all indicators available in the data were built into the model, and not all indicators built into the model are routinely available in the data. Such mismatches may have reduced the model's overall performance and there is scope for development of more standardised VA data collection tools. Other issues include variations in concepts and definitions of ‘acute’ and ‘chronic’ between physicians and regions, and difficulties in determining the sequence of events and identifying immediate versus antecedent and underlying causes of death from VA data. With regards to the latter point, it was felt that identification of the deceased's primary complaint or most prominent indicator before death could overcome some of the issues of sequencing events. It may be possible to include questions of this nature into VA questionnaires and introduce composite indicators into the model.

All validations require a suitable ‘gold standard’ for comparison. Although physician interpretation is often considered the gold standard for VA interpretation there is potential for misclassification and misinterpretation [2, 15]. This is highlighted by the fact that only 8/47 cases (17%) presented to a third physician for reassessment reached consensus with the original clinical opinion in this study. As such, physician interpretation of VA data must be used cautiously as a gold standard for validations [4]. It was therefore considered more appropriate in this study to compare the probabilistic approach with physician review in terms or agreement rather than sensitivity, specificity or PPV. .

The expert group felt that ‘old age’ should not be counted as an acceptable diagnosis of cause of death and it was decided to eliminate ‘non-specific old age’ as a possible cause from the model. Although the usefulness of gathering specific cause of death data in elderly people where there is little notion or possibility of implementing interventions is questionable, mortality data for the elderly has the potential to be useful in evaluating interventions implemented when the ‘old’ generation was the ‘middle aged’ generation. Furthermore, there are considerable cultural and regional variations in the concept of old age. As such, cases where the physicians' verdict was ‘old age’ were considered as indeterminate during the validation process. For the purposes of population health surveillance it would be useful to have standardised age categories across Indepth sites and indeed globally, but since that is not the case the expert group settled on age groups that made sense from a clinical perspective.

Adjusting the model's database to reflect local conditions of malaria endemicity and HIV/AIDS prevalence worked well in improving the performance of the model. This highlights the potential of using this probabilistic approach to standardised VA interpretation across a range of settings. Further testing of the refined model with more data from a wider range of settings may identify other key local characteristics that can be reflected in the model. Based on these principles it may also be possible to develop databases for the analysis of VA data relating to specific sub-categories, such as specific age groups and maternal deaths. This may provide more specific details of causes of death in populations which have significant potential for public health interventions [16]. Based on the principle of adjusting for prevalence it may also be possible or necessary to adjust prevalence of diseases based on successful intervention programmes. However, knowing the effectiveness of interventions is difficult given the lack of reliable data in settings where the VA model will be applied. It is therefore unrealistic to make such adjustments at present.

Assuming that the VA process is intended to mimic as far as possible the process of physician death certification, the innovative probabilistic approach to VA interpretation described here represents an improvement to current interpretation methods. The identification of multiple causes of death mimics classic physicians' assessment and has the added potential to facilitate cause of death surveillance at the community level where unsubstantiated choices between possible causes of death at the individual level are often made. Further thought and discussion about how to interpret and analyse multiple causes of death for individual cases is needed. One suggestion is that the death can be divided proportionately between different causes. For example, if the model lists chronic heart failure and pneumonia as two likely causes of death it would be reasonable to assign 50% of that death to each cause. This does not seem to greatly affect CSMFs as illustrated in Figure 1 and may in fact be more useful from an epidemiological perspective than making arbitrary choices between different likely causes.

Standardisation over time and between regions is a major advantage of this approach to VA interpretation. The benefits of 100% standardisation may justify a trade off against 100% accuracy whereby physician review is used as the somewhat flawed gold standard. In addition, the probabilistic model allows the interpretation of VA cases at a rate of approximately two per second and does not require extensive expertise to operate it, thereby greatly increasing efficiency and freeing up the time of physicians. The next step in the development of this system is to test the refined model with more extensive data from a wider range of settings and sources.


1. Chandramohan D, et al. Verbal autopsies for adult deaths: their development and validation in a multicentre study. Tropical Medicine and International Health. 1998;3:436–446. [PubMed]
2. Quigley MA, Chandramohan D, Rodrigues LC. Diagnostic accuracy of physician review, expert algorithms and data-derived algorithms in adult verbal autopsies. International Journal of Epidemiology. 1999;28:1081–7. [PubMed]
3. Huong DL, Minh HV, Byass P. Applying verbal autopsy to determine cause of death in rural Vietnam. Scandinavian Journal of Public Health. 2003;31(Supp. 62):19–25. [PubMed]
4. Chandramohan D, Setel P, Quigley M. Effect of misclassification of cause of death in verbal autopsy: can it be adjusted? International Journal of Epidemiology. 2001;30:509–14. [PubMed]
5. Okosun IS, Dever GE. Verbal autopsy: a necessary solution for the paucity of mortality data in the less-developed countries. Ethnicity & Disease. 2001;11(4):575–7. [PubMed]
6. Byass P, Huong DL, Minh HV. A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam. Scandinavian Journal of Public Health. 2003;31(Supp. 62):32–37. [PubMed]
7. Reeves BC, Quigley MA. A review of data-derived methods for assigning causes of death from verbal autopsy data. International Journal of Epidemiology. 1997;26:1080–9. [PubMed]
8. Indepth . I. Network. IDRC; Ottawa: 2002. Population and Health in Developing Countries, Volume 1: Population, Health and survival.
9. Keeney S, Hasson F, McKenna HP. A critical review of the Delphi technique as a research methodology for nursing. International Journal of Nursing Studies. 2001;38(2):195–200. [PubMed]
10. Powell C. The Delphi technique: myths & realities. Journal of Advanced Nursing. 2003;41(4):376–382. [PubMed]
11. Kong A, et al. How medical professionals evaluate expressions of probability. New England Journal of Medicine. 1986;315:740–4. [PubMed]
12. Byass P, Corrah PT. Medinfo-89. Amsterdam-Holland: 1989. Assessment of a probabilistic decision support methodology for tropical heath care.
13. Kahn K, et al. Validation and application of verbal autopsies in a rural area of South Africa. Tropical Medicine and International Health. 2000;5(11):824–831. [PubMed]
14. Chandramohan D, et al. The validity of verbal autopsies for assessing the causes of institutional maternal death. Studies in Family Planning. 1998;29(4):414–422. [PubMed]
15. Anker M. The effect of misclassification error on reported cause-specific mortality fractions from verbal autopsy. International Journal of Epidemiology. 1997;26:1090–6. [PubMed]
16. Høj L, Stensballe J, Aaby P. Maternal mortality in Guinea-Bissau: the use of verbal autopsy in a multi-ethnic population. International Journal of Epidemiology. 1999;28:70–76. [PubMed]