PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of prevchrondVisit CDC.govPreventing Chronic DiseaseDownload at iTunesDownload at Google PlayThis ArticleSubmit to PCDE-mail UpdatesPodcastsContact
 
Prev Chronic Dis. 2013; 10: E38.
Published online Mar 21, 2013. doi:  10.5888/pcd10.120202
PMCID: PMC3607338
Peer Reviewed
Impact of Data Editing Methods on Estimates of Smoking Prevalence, Global Youth Tobacco Survey, 2007–2009
Eugene Lam, MD, MSPH, MSc,corresponding author Italia Rolle, PhD, RD, Mikyong Shin, DrPH, MPH, RN, and Kyung Ah Lee, MS
Author Affiliations: Italia Rolle, Mikyong Shin, Global Tobacco Control Branch, Office on Smoking and Health, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, Georgia; Kyung Ah Lee, Northrup Grumman Information Systems, Atlanta, Georgia.
corresponding authorCorresponding author.
Corresponding Author: Eugene Lam, MD, MSPH, MSc, Epidemic Intelligence Service, Office of Surveillance, Epidemiology, and Laboratory Services, and Global Tobacco Control Branch, Office on Smoking and Health, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA. Telephone: 404-718-4294. E-mail: elam/at/cdc.gov.
Accuracy of self-reported data may be improved by data editing, a mechanism to produce accurate information by excluding inconsistent data based on a set number of predetermined decision rules. We compared data editing methods in the Global Youth Tobacco Survey (GYTS) with other editing approaches and evaluated the effects of these on smoking prevalence estimates. We evaluated 5 approaches for handling inconsistent responses to questions regarding cigarette use: GYTS, do-nothing, gatekeeper, global, and preponderance. Compared with GYTS data edits, the do-nothing and gatekeeper approaches produced similar estimates, whereas the global approach resulted in lower estimates and the preponderance approach, higher estimates. Implications for researchers using GYTS include recognition of the survey’s data editing methods and documentation in their study methods to ensure cross-study comparability.
Accurate monitoring of cigarette smoking status among youth is important in addressing the tobacco use epidemic globally (1). However, the accuracy of self-reported health-risk behaviors in questionnaires may be compromised because of difficulties in recall, social desirability, and sensitivity of the question itself (2). Data editing is a mechanism to produce accurate information by excluding inconsistent data based on a set number of predetermined decision rules. Research suggests that editing procedures have potential effects on point estimates and cross-study comparability (35). This exploratory study compares the data editing method used in the Global Youth Tobacco Survey (GYTS) with other data editing approaches and evaluates the effect of these on estimates of smoking prevalence in GYTS to inform collaborators globally.
GYTS, a self-administered school-based survey, uses a 2-stage cluster sample design that is grade-based and produces representative samples of students with ages ranging from 10 to 17 years. A subset of students aged 13 to 15 years is used for comparing the data within and across Word Health Organization (WHO) regions. In countries, such as small islands, where all students in the selected grades were surveyed, a census rather than a 2-stage cluster sample is conducted. The survey methods are described in detail elsewhere (6,7).
Eligible countries were selected on the basis of the following inclusion criteria: a nationally representative sample, recent completion of GYTS (2007–2009), large sample size (≥3,000 participants), and GYTS data publicly released. Of 35 eligible countries that met the inclusion criteria, 1 country from each WHO region was randomly selected for this study. Data analysis was performed on a subset of participants aged 13 to 15 years (n) among all ages in the grades selected for the survey (N). The selected countries and the year GYTS was conducted (values for n and N) are as follows: Ghana, 2009 (n/N = 4,171/8,295); Guatemala, 2008 (n/N = 3,838/5,565); Saudi Arabia, 2007 (n/N = 2,574/3,829); the Philippines, 2007 (n/N = 3,278/5,919); Slovakia, 2007 (n/N = 4,176/4,696); and Thailand, 2009 (n/N = 7,649/9,963).
Some questions from the GYTS presented the opportunity for participants to contradict themselves when responding (Table 1). Self-reported cigarette smoking on 1 or more of the past 30 days was used to determine cigarette smoking status. For this series of questions, 5 approaches were taken for handling inconsistent responses to questions regarding cigarette use: GYTS, do-nothing, gatekeeper, global, and preponderance (Table 1).
Table 1
Table 1
Selected Global Youth Tobacco Survey (GYTS) Questions and Data Edit Approaches
We used Stata 11 software (StataCorp LP, College Station, Texas) to account for complex survey design and to calculate weighted point estimates and standard error (SE) of the estimates. Estimates with a relative SE (ratio of the SE of the estimate to the estimate, multiplied by 100) greater than 30% were considered statistically unreliable. Adjusted Wald tests were used to evaluate for statistical differences between point estimates derived from the GYTS approach and the 4 other data editing approaches. Significance was set at P < .05.
Overall response rates of students interviewed (calculated as the school response rate multiplied by the class and student response rates) for all 6 countries were the following: 84.0% (Ghana), 79.6% (Guatemala), 82.1% (Saudi Arabia), 80.9% (Philippines), 86.1% (Slovakia), and 93.1% (Thailand). Data edit approaches resulted in variation of prevalence estimates of cigarette use; estimates ranged from 2.3% to 5.1% in Ghana, 8.9% to 12.4% in Guatemala, 4.9% to 6.5% in Saudi Arabia, 12.3% to 17.0% in the Philippines, 21.6% to 25.0% in Slovakia, and 9.6% to 11.9% in Thailand (Table 2). The global approach resulted in lower estimates and the preponderance approach, in general, higher estimates. The do-nothing and gatekeeper approaches produced estimates similar to those of the GYTS approach. The range and magnitude of differences in estimates derived from the global and preponderance approaches compared with those of the GYTS approach were greater among girls than boys. All comparisons of GYTS estimates were significantly different (P < .05) from estimates derived with the 4 other approaches, with several exceptions (Table 2). Consistent with the overall estimates, the global approach resulted in lower estimates, the preponderance approach higher estimates, and the do-nothing and gatekeeper approaches similar estimates, by sex across all selected countries.
Table 2
Table 2
Prevalencea of Cigarette Use Among Global Youth Tobacco Survey (GYTS) Participants Aged 13–15 Years in Select Countriesb, by Data Editing Approach
We demonstrated the effect of decision rules for handling data inconsistencies in GYTS data to assist collaborators globally. Smoking prevalence estimates generated from surveys can vary with the data editing approach used. Compared with the GYTS data edits, the global approach resulted in lower estimates and the preponderance approach, higher estimates. It is noteworthy that the do-nothing and gatekeeper approaches produced estimates similar to those of the GYTS data editing method. In comparison to the GYTS approach (7 logic checks), data editing methods in the National Youth Tobacco Survey and Youth Risk Behavior Survey are more extensive (more than 30 logic checks for each), suggesting a need to provide a more comprehensive list of logic checks to account for all possible combinations of inconsistencies in GYTS data (8,9).
This study shows how different ways of removing inconsistent data influence the degree to which cigarette smoking is estimated. Clearly described methods for handling inconsistent data are necessary for reproducibility and comparability of GYTS results. Multiple researchers across WHO regions use and publish GYTS data, and accurate comparisons between 2 studies can be made only if the same approach in handling inconsistent data is used. Resolving issues with data inconsistency may include piloting surveys before implementation and incorporating built-in skip patterns if electronic versions of the survey are explored in the future. A limitation of this study is that the list of sampled countries is not representative of, and therefore not generalizable to, all countries conducting GYTS.
Data cleaning and management, as essential aspects of quality assurance and determinants of study validity, require transparency and proper documentation of all procedures (10). Implications for researchers using GYTS include recognition of its data editing approach and documentation in their study methods to ensure cross-study comparability.
Acknowledgments
This project received no funding. None of the authors have a commercial or other financial interest associated with the information presented in this manuscript.
Footnotes
The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions.
Suggested citation for this article: Lam E, Rolle I, Shin M, Lee KA. Impact of Data Editing Methods on Estimates of Smoking Prevalence, Global Youth Tobacco Survey, 2007–2009. Prev Chronic Dis 2013;10:120202. DOI: http://dx.doi.org/10.5888/pcd10.120202.
1. Global Tobacco Surveillance System Collaborating Group Global Tobacco Surveillance System (GTSS): purpose, production, and potential. J Sch Health 2005;75(1):15–24. doi: 10.1111/j.1746-1561.2005.tb00004.x. [PubMed] [Cross Ref]
2. Brener ND, Billy JO, Grady WR. Assessment of factors affecting the validity of self-reported health-risk behavior among adolescents: evidence from the scientific literature. J Adolesc Health 2003;33(6):436–57. doi: 10.1016/S1054-139X(03)00052-1. [PubMed] [Cross Ref]
3. Bauer UE, Johnson TM. Editing data: what difference do consistency checks make? Am J Epidemiol 2000;151(9):921–6. doi: 10.1093/oxfordjournals.aje.a010296. [PubMed] [Cross Ref]
4. Frendrich M, Johnson TP. Examining prevalence differences in three national surveys of youth: impact of consent procedures, mode, and editing rules. J Drug Issues 2001;31(3):615–42.
5. Brittingham A, Tourangeau R, Kay W. Reports of smoking in a national survey: data from screening and detailed interviews, and from self- and interviewer-administered questions. Ann Epidemiol 1998;8(6):393–401. doi: 10.1016/S1047-2797(97)00237-8. [PubMed] [Cross Ref]
6. Warren CW, Riley L, Asma S, Eriksen MP, Green L, Blanton C, et al. Tobacco use by youth: a surveillance report from the Global Youth Tobacco Survey project. Bull World Health Organ 2000;78(7):868–76. [PubMed]
7. Warren CW, Lea V, Lee J, Jones NR, Asma S, McKenna M. Change in tobacco use among 13-15 year olds between 1999 and 2008: findings from the Global Youth Tobacco Survey. Glob Health Promot 2009;16(2Suppl):38–90. doi: 10.1177/1757975909342192. [PubMed] [Cross Ref]
8. Centers for Disease Control and Prevention. National Youth Risk Behavior Survey (YRBS) Data user guide; 2009. ftp://ftp.cdc.gov/pub/data/yrbs/2009/YRBS_2009_national_user_guide.pdf.
9. Office of Smoking and Health The Youth Tobacco Survey (YTS) handbook. Atlanta (GA): Centers for Disease Control and Prevention; 2011.
10. Van den Broeck J, Cunningham SA, Eeckels R, Herbst K. Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med 2005;2(10):e267. doi: 10.1371/journal.pmed.0020267. [PMC free article] [PubMed] [Cross Ref]
Articles from Preventing Chronic Disease are provided here courtesy of
Centers for Disease Control and Prevention