|Home | About | Journals | Submit | Contact Us | Français|
Correspondence to: Mohamad Amin Pourhoseingholi, PhD, Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Arabi Ave, Daneshjoo Blvd, Velenjak, Tehran 1985717413, Iran. moc.oohay@ghp_nima
Telephone: +98-21-22432515 Fax: +98-21-22432517
To correct for misclassification error in registering causes of death in Iran death registry using Bayesian method.
National death statistic from 2006 to 2010 for gastric cancer which reported annually by the Ministry of Health and Medical Education included in this study. To correct the rate of gastric cancer mortality with reassigning the deaths due to gastric cancer that registered as cancer without detail, a Bayesian method was implemented with Poisson count regression and beta prior for misclassified parameter, assuming 20% misclassification in registering causes of death in Iran.
Registered mortality due to gastric cancer from 2006 to 2010 was considered in this study. According to the Bayesian re-estimate, about 3%-7% of deaths due to gastric cancer have registered as cancer without mentioning details. It makes an undercount of gastric cancer mortality in Iranian population. The number and age standardized rate of gastric cancer death is estimated to be 5805 (10.17 per 100000 populations), 5862 (10.51 per 100000 populations), 5731 (10.23 per 100000 populations), 5946 (10.44 per 100000 populations), and 6002 (10.35 per 100000 populations), respectively for years 2006 to 2010.
There is an undercount in gastric cancer mortality in Iranian registered data that researchers and authorities should notice that in sequential estimations and policy making.
Core tip: In some mortality cases, causes of deaths are registered as causes that cannot or should not be considered as the underlying causes of death like cancer without mentioning the type. These cases are not included in the estimations of cause specific mortality rates and leads to under-estimate health risks and burden of disease. The aim of this study is to correct the misclassification of gastric cancer deaths in cancer without label group using a Bayesian method.
Cancer is one of the major health problems in the world and is the third cause of death (after cardiovascular disease and injuries) in Iran. Gastric cancer is a disease in which the cells of the inner lining of the stomach start to divide abnormally and uncontrollably, that forming a mass called tumor. Gastric cancer is the seventh cause of all deaths in Iran and is the first cause of cancer death in Iranian men and the second cause of cancer death (after breast cancer) in Iranian women. The mortality of gastric cancer is high because this cancer does not show symptoms in early stages and diagnosed when the cancer is in its final stages.
Burden of disease is used to evaluate the health status of a country and determining priority of risk factors in order to setup cancer control programs. Cancer registry data are important to estimate the burden of disease, monitoring the screening programs effects, early diagnostics and other prognostic factors, and can be used to guide policy makers to appropriate cancer prevention programs. Among medical indices, mortality is a familiar projection to assess the burden of diseases. But achieving this aim requires a reliable death registry systems that reports death statistics accurately and completely[5-7]. In Iran, among four vital events (births, marriages, divorces and mortality) which were registered by the National Organization for Civil Registration (NOCR), mortality was the worst in quality. There was some progress in registering deaths but some problems like delayed registration and inaccurate recording of causes of death remained until 2002, that Ministry of health and medical education Deputy of Research and Technology, started up a system to record the causes of deaths. This system did not allow to delayed deaths registry, but the causes of death were susceptible to information bias due to misclassification. Most high-income and many middle-income countries have a complete vital registration system in which the majority of deaths get a death certificate completed by a physician. But still, a number of causes of death in the process of completing death certificates and the coding of underlying cause of death based on standardized international rules, remains challenging[10-13]. In some cases, especially in developing countries, the cause of death is recorded with error[14,15]. For example if a death due to gastric cancer being labeled as a death due to any other cause, the misclassification error in outcome is occurs. Misclassification error makes the registered data inaccurate and often leads to major problems like biased estimates of burden and health risks in epidemiological analysis[16,17].
According to the Iranian death registry, about 15% to 20% of death statistics are recorded in misclassified categories such as cardiopulmonary arrest, old age without dementia, septicemia, unknown, cancer without mention of details, and other ill-defined conditions. Murray and Lopez in 1996, for the first time, introduced the term “garbage coding” for assigning deaths to causes that are not useful for public health analysis of cause-of-death data[18-21].
In developing countries like Iran that registration is not completely accurate, statistical methods can be very helpful to overcome this problem. Two statistical approaches are recommended to deal with misclassification; first is using a small valid sample and extending the results to the population and the second is Bayesian analysis which is a flexible method that makes the possibility of combining the prior information regarding the subset of the parameters with the observed data to achieve a posterior distribution which will be the basis of inferences to correct the statistics. Bayesian models also can easily accommodate unobserved variables such as an individual’s true information in the presence of Misclassification error. The aim of this study is to use Bayesian method to estimate the rate of misclassification that occurs by registering cancer (with no label) as the cause of death instead of deaths that have occurred because of gastric cancer in Iran’s cancer registry system.
Mortality rates due to gastric cancer and also cancer without label from 2006 to 2010 are extracted from Iranian annual of death statistics which reported annually by Iran’s Ministry of Health and Medical Education, in two sex groups (male and female) and four age groups (under 15 years - 15 to 49 years - 50 to 69 years - 70 years and more).
To reassign deaths from garbage codes to valid causes, the approach can be divided into three steps: The first is identifying garbage codes. The second is identifying the target causes where the deaths assigned to a garbage code should in principle be reassigned to; for example if a death cause is registered as cancer and the type of cancer is not mentioned, we face with a garbage code that should be reassigned to a specific cancer. The third step is choosing the fraction of deaths that are assigned to the garbage code that should be reallocated to the target cause. In this study we consider cancer without label as garbage code because cancer with no label is most likely to be registered as cause of death instead of a specific cancer like gastric cancer. The data were entered to the Bayesian model by two vectors y1 = [y11, y21,...,yr1] for gastric cancer and y2 = [y12, y22,...,yr2] for cancer without label. Both y1 and y2 are count data and follow the Poisson distribution. The subscript r shows the number of covariate patterns that is made by age and sex group combinations. θ is considered to be the probability of incorrectly register a mortality from gastric cancer as mortality due to cancer without label group. To perform Bayesian inference, an informative beta prior distribution was assumed for the misclassified parameter, i.e., θ ~ beta (a, b). The initial value for the parameter of beta distribution are taken to be a = 20 and b = 80, based on Iranian annual cancer registration reports. Since θ(misclassified parameter) is an unknown parameter, a latent variable approach was employed to simplify the full conditional models; considering Ui | θ, y1, y2 ~ Binomial (yi2, Pi) as the number of counts from the first group that are incorrectly labeled as being in the misclassified group that Pi = (λi1θ)/(λi1θ + λi2), finally the posterior distribution appears in the following form; θ | Ui, y1, y2 ~ Beta (∑iUi + a, ∑iyi + b). The misclassified parameter is estimated using a Gibbs sampling algorithm and averaging of the outcome. Analyses were done using R software version 3.2.0.
Mortality data consisting of all deaths due to gastric cancer from 2006 to 2010 were considered in this study. Age standardized rate (ASR) of gastric cancer mortality was 9.69 per 100000 populations in 2006, 10.2 per 100000 populations in 2007, 9.93 per 100000 populations in 2008, 9.76 per 100000 populations in 2009 and 9.67 per 100000 populations in 2010 respectively. According to the Bayesian estimation, in year 2006, there was between 3% to 7% misclassification in registering cause of death as cancer without mentioning details while the underlying cause of death has been gastric cancer. The estimated percent of misclassification based on implemented Bayesian method for year 2006 to 2010 is shown in Table Table1.1. This percent were subtracted from deaths that had registered as cancer without mentioning details and added to the number of deaths due to gastric cancer. The age standardized rate per 100000 populations for gastric cancer was estimated to be 10.17 in 2006, 10.51 in 2007, 10.23 in 2008 10.44 in 2009 and 10.35 in 2010, after Bayesian correction respectively. The age standardizes rate of gastric cancer before and after Bayesian correction for 2006 to 2010 is visualized in Figure Figure1.1. The number of gastric cancer death before and after Bayesian correction of misclassification for years 2006 to 2010 is shown in Table Table11 and its trend is shown in Figure Figure22.
Iran’s death registry is subject to misclassification in reporting the underlying cause of death. About 3%-7% of deaths due to gastric cancer are registered as cancer without mentioning the type of cancer. After correcting misclassification error in death registry data, the number of deaths due to gastric cancer and its age standardized rate were increased. Gastric cancer crude mortality count in Iran had an increasing trend from year 2006 to 2010 except for 2008 that might be because of incompleteness of data; but the age standardized rate of gastric cancer was decreasing from year 2007 onward (except for 2008). About two-thirds of gastric cancer occurs in developing countries[24-27] and its rates are generally about twice as high in men as in women. The age standardized rate (ASR) of gastric cancer incidence and mortality per 1000000 populations based on GLOBOCAN report 2012 is shown in Table Table2.2. The rates show that the ASR of gastric cancer incidence (15.8 per 100000) and also the ASR of gastric cancer mortality (11.7 per 100000) is highest in Asia compared to other continents; It is moderate in Europe and South America and lowest in Northern America and most parts of Africa[3,28].
The age standardized rates of incidence and mortality per 100000 populations in different regions of Asia based on GLOBOCAN report 2012 are shown in Table Table3.3. The incidence and mortality rates are also higher in Eastern Asia in comparison with other Asian regions. This region includes China, Japan and South Korea, that are three countries with the highest gastric cancer incidence and mortality rates. Gastric cancer is the most frequently diagnosed form of cancer in Iran, with incidence rate 15.3 per 100000 and mortality rate 12.9 per 100000 populations based on GLOBOCAN report 2012. A steady decline has been observed in gastric cancer incidence and mortality rates in the most of countries in Northern America and Europe since the middle of the 20th century[31,32]. In recent years similar decreasing trends have been noted in areas with high rates of gastric cancer history, including some countries in Asia (Japan, China, and South Korea), Latin America (Colombia and Ecuador), and Europe (Ukraine). This reduction maybe due to improved sanitation and antibiotics and consequently reduction in chronic H. pylori infection. Although the age-adjusted rates have been decreased, it is estimated to have a substantial rise in the crude rates between the years 2000 to 2020 because of the increasing the size and age of the world population, especially in developing countries[35,36].
Gastric cancer is a major health problem in the world, especially in Asia. So it is needed to make appropriate policy making for allocation of resources for gastric cancer control and prevention. To achieve this aim an accurate registry system is needed, while there are some misclassifications in registering causes of death especially in developing countries[14,15]. Misclassification of causes of death has been a concern in cancer trends analysis and researches on cancer epidemiology for decades. Misclassification error leads to under-estimation of cause specific mortality rates and consequently under-estimation in burden of disease and influences the policy makings and health risk prioritizations[10-12,37]. In the study of Khosravi et al, validated data from hospital death was used to measure the impact of misclassification on rates of cardiovascular disease mortality. But they didn’t employ Bayesian method. Bayesian approach has received much attention to correct for misclassification in mortality data. Whittemore and Gong used a Bayesian approach to estimate cervical cancer mortality rates and Sposto et al developed maximum likelihood method for assessing the effect of diagnostic misclassification on non-cancer and cancer mortality in atomic-bomb survivors. Stamey et al provided a Bayesian approach, which extends the models introduced by Whittemore and Gong and Sposto et al. They assume that the misclassification parameters are unknown. They used the prior information on the misclassification parameters instead of using valid data. They applied their Bayesian approach for estimating the number of deaths due to cancer and non-cancer after correcting for misclassification in registering causes of deaths among survivors of Hiroshima and Nagasaki after atomic bombings. Pourhoseingholi et al extended the models proposed by Stamey et al to re-estimate the rates of cause specific deaths in cancer registry data after correcting for misclassification[25,42,43]. Based on his study on gastric cancer mortality in Iranian population from 1995 to 2004, there were between 30%-40% misclassification in recording deaths due to gastric cancer. The current study reveals that the accuracy of death registration in Iran is getting better in recent years.
In conclusion there is an undercount of gastric cancer mortality in Iranian registration system Because of misclassification error in registering causes of death. Although it seems that the misclassification rate has been reduced, it still exists as a major problem. So, policy makers who use mortality data to determine priorities for disease control and prevention, should notice to this underreported data and registration of causes of deaths should be done more accurately. Increase in data accuracy, requires more expert staffing, refining foundations, and powerful hardware and software resources. In the absence of valid data, Bayesian approach is a good and flexible alternative to reduce the effects of Misclassification in registered cancer mortality data.
Mortality data registries are subject to misclassification; because some deaths assigned to causes that cannot considered as underlying death cause. For example if mortality due to a special cancer be registered as cancer without mentioning the type of cancer, misclassification error occurs. The aim of this study is to estimate the rate of misclassification in registering deaths due to gastric cancer in cancer without label group using a Bayesian method and re-estimate the rate of gastric cancer mortality in Iran.
In Iran, death registries data is subject to misclassification. Reviewing the medical records or verbal autopsy as a practical solution for misclassification is time consuming. The hotspot of this study is using the Bayesian method for estimating the rate of misclassification in registering causes of death, which is rapid and cost-effective.
By using the Bayesian method, it is not needed to valid the data for estimating the rate of misclassification. Data validation is very costly and time consuming and in many cases it is not possible to obtain valid data. For implementing the Bayesian method only prior information about the misclassification rate is enough.
Since registered mortality data is used for health policy making and estimating the burden of disease, after correcting the misclassification in death registry system, more precise estimates of death rates and cause specific burden of disease will be achieved. Consequently there will be a better planning for disease control and prevention.
Misclassification is lack of agreement between the observed value and the true value in categorical data. Bayesian method is one of the statistical approaches that assign a distribution or a probability to events or parameters based on previous experience or an expert’s idea and revise those probabilities and distributions after obtaining experimental data with applying Bayes’ theorem.
This is an interesting research.
Institutional review board statement: The study was reviewed and approved by research committee of research institute for gastroenterology and liver diseases (Tehran).
Informed consent statement: Hereby it is attested that this manuscript which is submitted for publication in World Journal of Gastrointestinal Oncology has been read and approved by all authors, has not been published, totally or partly, in any other journal.
Conflict-of-interest statement: There are no conflicts of interest to report.
Data sharing statement: No additional data are available.
Manuscript source: Invited manuscript
Specialty type: Gastroenterology and hepatology
Country of origin: Iran
Peer-review report classification
Grade A (Excellent): 0
Grade B (Very good): B
Grade C (Good): C
Grade D (Fair): D, D
Grade E (Poor): 0
Peer-review started: August 26, 2016
First decision: September 27, 2016
Article in press: January 12, 2017
P- Reviewer: Aoyagi K, Deans C, Lee HC, Shen LZ S- Editor: Kong JX L- Editor: A E- Editor: Lu YJ