|Home | About | Journals | Submit | Contact Us | Français|
Longitudinal designs enable examination of temporal relationships between exposures and health outcomes, but extended participation can cause study fatigue. We present an approach for analyzing data quality and study fatigue in a participatory, longitudinal study of adolescents.
Participants (N=340) in the Rural Air Pollutants and Children's Health study completed daily diaries for 3-5 weeks in 2009 while we monitored outdoor pollutant concentrations. We used regression models to examine established associations between disease, symptoms, anthropometrics, and lung function as indicators of internal consistency and external validity. We modeled temporal trends in data completeness, lung function, environmental odors, and symptoms to assess study fatigue.
Of 5728 records, 94.2% were complete. Asthma and allergy status were associated with asthma-related symptoms at baseline and during follow-up, e.g., prevalence ratio=8.77 (95% confidence interval: 4.33, 17.80) for awakening with wheeze among diagnosed asthmatics versus non-asthmatics. Sex, height, and age predicted mean lung function. Plots depicting outcome reporting over time and associated linear trends showed time-dependent declines for most outcomes.
We achieved data completeness, internal consistency, and external validity, yet still observed study fatigue, despite efforts to maintain participant engagement. Future investigators should model time trends in reporting to monitor longitudinal data quality.
Although prospective studies can reduce recall bias, conditioning effects such as sensitization and fatigue may impact data quality in longitudinal studies.1-4 Sensitization typically occurs at the beginning of the study when participants become more aware of their health and over-report behaviors or symptoms.1 Fatigue in the later stages of participation leads to underreporting.1 Reliability and validity of data can be evaluated using internal consistency and criterion validity measures 2, in addition to missing data assessments.
Several studies have documented conditioning effects among adolescents using health diaries.2-4 In The Asthma Daily Diary for Children study (ages 7-12), investigators noted that 64% of respondents reported fatigue with keeping a diary, and that missing data mostly occurred in the final week of follow-up. 2 Strickland et al. (2006) followed injury reports from youth aged 9-18 for 13 weeks and although there was no indication of study fatigue based on missing data, they observed a time-dependent decline in injury reporting.4
Although the risk of bias in longitudinal studies is not necessarily greater for adolescents compared to adults, investigators conducting research with children can minimize errors by keeping children engaged.5 Techniques include involving adolescent participants in questionnaire design5-7; offering encouragement and rewards2; sharing study progress8; and incorporating innovative research methods, including drawings, photographs, participatory techniques, diaries, and worksheets6. Punch (2002) also suggests maintaining confidentiality, developing rapport between researchers and participants, giving comprehensive, unambiguous instructions, avoiding leading questions, and permitting “don't know” responses to avoid guesses.6 Ozer (2010) emphasizes consideration for researcher-participant dynamics in research involving youth of color to avoid disengagement if the common dynamic of white teachers questioning students is replicated in research design.8
The Rural Air Pollutants and Children's Health (RAPCH) study employed many of these approaches during participatory data collection with middle school students in eastern North Carolina (NC). The study was collaboratively designed by researchers from the University of North Carolina at Chapel Hill (UNC-CH) and community partners from the Rural Empowerment Association for Community Help (REACH), a community-based organization seeking to provide economic and environmental justice for residents of rural southeastern NC. We used a longitudinal design to assess acute health effects associated with daily air pollutant concentrations at three middle schools near large-scale livestock facilities that emit particles and gases that can affect respiratory health.9 In NC, 99% of the nearly 10 million swine under production are raised in facilities with over 1000 animals.10 Cross-sectional studies have documented associations between home and school proximity to swine facilities and prevalence of asthma-related illness in children.11-13
In the RAPCH study, adolescents completed their own diaries and recorded their own lung function values during science class. Here we describe our data collection methods and engagement strategies, present an analysis of data quality, and discuss implications for our research aims to inform future longitudinal studies.
REACH staff recruited three public middle schools for the study. Participating schools had 9-56 swine barns and 4-25 poultry barns within two miles. School staff selected science classes for the study based on class size, schedule, and student maturity. Teachers learned the study protocol and confidentiality procedures, but did not collect data.
After a presentation about air pollution and health effects, we described the study to science classes. Students received a packet containing a letter of support from the principal and science teacher plus parental consent forms in English and Spanish. We obtained assent from students who returned forms indicating parental consent. The UNC-CH Institutional Review Board (IRB) reviewed and approved study activities.
We collected data between February and November 2009 in five waves lasting three to five weeks each; three classes comprised each wave. 340 of 358 students (95%) from 15 science classes participated. At baseline, participants reported socio-demographic information, exposures to smoking, and exposures to livestock, and answered questions about asthma-related diagnosis/symptoms drawn from a previous school-based study of adolescent asthma in NC, which included the International Study of Asthma and Allergies in Childhood (ISAAC) video questionnaire.14-17 Participants then received binders containing a daily diary and a Mini-Wright Digital (MWD) peak flow meter (Clement Clarke International, Harlow, United Kingdom). Due to supply challenges, some participants received MWDs after they began daily diary completion. We trained participants to use the diaries and peak flow meters, emphasizing accurate and honest reporting. Last, we measured participant height as an indicator of expected pulmonary function values.
Participants took approximately 10 minutes each day to complete the following steps: 1) report the strength of eleven illness symptoms using a scale of None, Barely There, Present, Strong, Very Strong; 2) record 24-hour odor observations for engine exhaust, livestock, and smoke using the same scale; 3) report asthma and allergy medication use, respiratory-related physician visits, and respiratory-related school absences; 4) record time outside in the previous 24 hours; and 5) measure forced expiratory volume in one second (FEV1) and peak expiratory flow (PEF) three times with their MWD instruments, recording each measurement to supplement electronic data.
REACH hired local community members to assist data collection and promote data quality. These community liaisons were former educators who trained in research ethics, learned the study protocol, and were approved by the UNC-CH IRB. Typically, two liaisons were present daily to distribute and collect diaries, monitor use of MWDs, and check diaries for completion. Although participants maintained autonomy in their responses, liaisons briefly checked diary pages and alerted students to blank sections, e.g., skipped pages. After a school absence, students were instructed not to complete missed diary entries. Liaisons also verified air pollution monitors measuring particulate matter less than 10 micrometers in aerodynamic diameter (PM10) and hydrogen sulfide (H2S) inside and outside of schools.
We incorporated concepts of scientific inquiry, technological design, air pollution, and the human respiratory system into research activities to complement the NC Standard Course of Study.18 We also demonstrated air monitoring instruments and MWD data downloading. Using preliminary results, students practiced interpreting summary statistics and generated graphs. To encourage participation, we provided incentives at the student, teacher, and school levels.
We analyzed data for students who completed both a baseline survey and a diary (N=340). We considered each diary entry a record, with a maximum of 25 records per participant (5 days/week for up to 5 weeks). Of 6249 records collected, we excluded 521 with >50% of items missing as presumed school absences; 77% of these excluded records were marked “absent” by community liaisons and may have been completed retrospectively by participants. We conducted analyses on the 5728 remaining records.
We examined data completeness by tallying missing items for each record. For lung function data, we tallied missing written values separately for records with and without stored electronic data, since written values without electronic data may be fabricated. We then computed the number and percent of complete records overall and by diary section.
We defined five categories of asthma-related disease at baseline using established definitions from the ISAAC study and a previous school-based study conducted state-wide in NC.19
We define internal consistency as the observation of expected relationships between variables within our data set that measure similar traits.2 We used bivariable log-binomial models to estimate associations between 1) allergy status and the prevalence of two asthma outcomes (diagnosed asthma and diagnosed current asthma) and 2) asthma-related disease and the prevalence of ISAAC symptoms. We also used Poisson-distributed generalized estimating equations to estimate within-person associations between asthma-related disease and the daily prevalence of four symptoms commonly associated with asthma and allergy: wheeze, shortness of breath, tightness in chest, and runny nose.14,20-22 We used bivariable linear models to examine: 1) sex, age, height, race/ethnicity, and asthma-related disease as predictors of participant mean lung function values from the first two days of follow-up,23-25 and 2) frequent livestock exposure as a predictor of mean livestock odor from the first two days of follow-up. We limited longitudinal outcome data to the first two days of follow-up for consistency with analyses, discussed below, that used this time period as the referent group for changes in reporting over time. For these analyses, we present linear model beta coefficients (β) and their 95% confidence intervals (CI) to indicate the magnitude and precision of effect estimates.
In the absence of criterion validity measures that would enable comparison of survey responses with external sources such as medical records,2 we define external validity as the comparability of associations in our study with those from other relevant studies. We assessed external validity by comparing associations between variables predictive of lung function in our data with associations expected based on the lung function literature. We also compared the prevalence of asthma and allergy in our study population with published results from previous surveillance of adolescents in NC.
Finally, we used linear and logistic fixed effects models26 to determine whether the daily percentage of complete items or levels of response (e.g., lung function values, level of odor observed, level of symptoms present) varied with follow-up time (day-in-study). Fixed effects models estimate average within-person associations and control for measured and unmeasured time-invariant confounders such as sex, race/ethnicity, age, and asthma diagnosis 26. We categorized day-in-study to stabilize estimates and used indicator variables for day-in-study categories to compute beta coefficients representing change from Day 1-2 (reference category). We stratified these analyses by wave to account for differences in location, season, and length of follow-up, plotting beta coefficients and generating linear regression terms by wave. Outcomes assessed included percentage of complete items within records, mean FEV1, mean PEF, frequency of odor and symptom reports above “None”, and 24-hour mean outdoor pollutant concentrations.
Participant characteristics (N=340) are presented in Table 1. Most participants (87%) were in 7th or 8th grade. Approximately two-thirds reported participation in the federal free or reduced-price lunch program. Using mutually exclusive categories of student-reported race/ethnicity, 28% of students were Black, 31% were Hispanic, 35% were White, and 7% were Other. Although 62 (19%) students reported previously diagnosed asthma, only 34 (10%) were classified as diagnosed current asthmatics. Additionally, 48 (15%) had current wheeze without asthma diagnosis and 13 (4%) had frequent wheeze without asthma diagnosis. Ninety-eight students (29%) reported allergies.
We observed high data completeness within diary records (Supplemental material, Table S1). Of 5728 diary records, 5395 (94.2%) had all items completed, 215 (3.8%) had one item missing, and 118 (2.1%) had more than one item missing. By section, odor reports were most often missing, although completeness was still very high (97.6%). Completeness of written lung function data was excellent (99.4%) for records with electronic MWD data (N=5035). Among records with no electronic MWD data (N=693), however, 476 (68.7%) unexpectedly had written responses, after excluding known cases of instrument malfunction.
At baseline, having allergies was associated with diagnosed asthma (Prevalence ratio (PR)=2.72, 95% CI: 1.74, 4.26, data not shown) and each of the ISAAC symptoms (PR range of 1.55-3.15, Table 2), yet was also associated with a decreased prevalence of diagnosed current asthma (PR=0.86, 95% CI: 0.78, 0.95, data not shown). Diagnosed asthma was associated with even higher prevalence of ISAAC symptoms (PR range of 2.42-8.77, Table 2). Of the five ISAAC symptoms, being awakened by wheeze had the greatest magnitude of association with both allergy and diagnosed asthma.
We report associations between asthma-related disease status at baseline and the prevalence of symptom reports during daily follow-up in Table 3. Diagnosed current asthma was associated most strongly with reports of wheeze (PR=6.02, 95% CI: 3.28, 11.03) and shortness of breath (PR=1.95, 95%CI: 1.07, 3.53). Frequent wheeze without asthma diagnosis was most strongly associated with shortness of breath (PR=2.94, 95% CI: 1.25, 6.90) and tightness in chest (PR=3.09, 95% CI: 1.01, 9.40), although estimates were imprecise due to few participants in this category. Allergies were also most strongly associated with wheeze (PR=2.37, 95% CI: 1.27, 4.41).
In Table 4, we present results from bivariable models of baseline characteristics predicting mean FEV1 and PEF during the first two days of follow-up. Male sex, age, and height were all associated with increased mean FEV1 and PEF, indicated by positive beta coefficients and relatively small standard errors. For example, FEV1 was 0.39 L greater (95% CI: 0.25, 0.53) and PEF was 44.54 L/min greater (95% CI: 26.92, 62.16) for males than females. Compared to a combined White/Other reference group, Black participants had lower FEV1 values (β = -0.28, 95% CI: -0.46, -0.10). Other differences in FEV1 or PEF for Black or Hispanic participants compared to White/Other were positive but imprecise. Although we observed negative beta coefficients for diagnosed asthma and allergies predicting lung function, results were small in magnitude and imprecise.. Finally, frequent livestock exposure was associated with increased mean livestock odor score (β = 0.29 units on a 5-unit scale, 95%CI: 0.15, 0.44, results not shown), although eight participants with frequent livestock exposure never reported any livestock odor.
The results of Table 4 also contribute to our assessment of external validity, in that average lung function values were predicted by sex, age, and height.23-25 We found the same prevalence of asthma diagnosis (19%) as reported in the 2009 NC Youth Risk Behavior Surveillance survey.27 Compared with the North Carolina School Asthma Survey (NCSAS) conducted in 1999-2000, we found the same prevalence of current diagnosed asthma (10%) and a similar prevalence of frequent wheeze without asthma diagnosis (RAPCH 4%, NCSAS 6%).19
In Figure 1a-d, we present the parametric forms for four selected outcome measures by grouped day-in-study, including data completeness, the most clinically relevant lung function parameter (FEV1), the most commonly reported symptom (runny nose), and the most relevant reported odor (livestock). Plotted beta coefficients represent the change in values using day-in-study 1-2 as the reference category. Diary completeness initially increased, then remained fairly constant for most waves; linear trends were positive for four of five waves (Figure 1a). Self-measured FEV1 declined over time for all waves except wave 3, as shown by plotted values and linear trends (Figure 1b). We also observed temporal declines in the log odds of reporting livestock odor and runny nose; all waves had negative linear trends (Figures 1c and 1d). We observed similar temporal declines and negative linear trends for most symptoms, odors, and lung function parameters across waves (Supplemental Table 1 and Supplemental Table 2). In contrast, outdoor 24-hour mean PM10 and H2S concentrations varied daily without consistent linear trends (Figure 2a-b).
A key strength of longitudinal designs is the ability to examine temporal relationships between exposures and outcomes, but this approach remains vulnerable to time-varying measurement error that may result from study fatigue. During and immediately following data collection waves, we assessed data completeness, internal consistency, and external validity. We did not assess changes in measurement over follow-up time, however, until we observed paradoxical relationships during initial longitudinal analyses (e.g., increased pollutant levels associated with increased lung function). This prompted the more detailed data quality examination presented here. Future investigators should conduct similar checks during longitudinal data collection to promptly address study fatigue if it occurs.
The observed associations between asthma-related outcomes at baseline and symptom reports during follow-up suggested internal consistency in our study. The observed relationships between physical characteristics and lung function represent internal consistency as well as external validity.23-25 Further, baseline prevalence of asthma-related outcomes suggested external validity when compared with previous statewide surveillance.
We observed declines in numerous self-measured or self-reported outcomes over time, however. We considered several explanations, such as day-of-week and time-of-day effects. Based on the figures, we concluded that a day-of-week effect was not evident since we did not see periodic spikes or leveling values coinciding with the start of a new week. To our knowledge, the main temporal effect on lung function over short time periods results from circadian changes, typically with lowest lung function values in the morning and highest lung function values in the afternoon.28 We avoided confounding by time of day by having participants complete their maneuvers at the same time each day. Interestingly, the wave that showed the least decline over time was Wave 3, which was conducted at the beginning of the school year, as requested by the principal for maximum student engagement.
It is possible that sensitization to odor and outcome reports contributed to a relative increase in reports at the beginning of the study; even with this possibility, continued declines during several weeks of follow up led us to conclude that fatigue occurred. Our findings are consistent with those of Strickland et al., who found a lack of time-related recording errors, e.g., missing data or partial missing data, but observed decreases in injury reporting over time in a repeated-measures study with adolescents.4 Although two prior studies reported an association between study length and missing data2,4 and one recommended limiting follow-up with children to <8 weeks,2 we observed declines within the first week for most measures. Our simple survey designed for quick completion may have been monotonous during follow-up. We recommend considering daily coaching and more engaging response formats, such as brief daily interviews or coloring responses to survey questions, as well as shorter follow-up. To our knowledge, no additional studies have reported time-related data quality concerns in longitudinal studies with adolescents.
Lung function may be especially difficult for children to measure independently. These maneuvers are known to be highly dependent on effort,29 which can improve with coaching. One study assessing peak flow meter technique among asthmatic children found that only 24% of participants independently completed all steps correctly.30 Other studies of children conducting lung function measurements independent of a respiratory therapist or other trained technician have observed decreased protocol compliance over time.31,32 Conversely, a study of respiratory health among school children in the Brazilian Amazon maintained quality control for participants' PEF measurements via daily supervision from trained research staff.33 We found that 68.7% of entries without electronic measurements had written lung function values, excluding cases of instrument malfunction. We knew that a subset of records would not have electronic MWD data because some students received instruments after participation began. Thus, these written values may have been fabricated to comply with the expectation of turning in complete school assignments. Alternatively, some participants were reluctant to perform lung function maneuvers in front of peers and may have written values without completing maneuvers.
We previously documented that that RAPCH participants experienced academic enhancement and increased environmental health awareness during data collection.34 We had hoped that our participatory approach would engage participants sufficiently to avoid possible detriments to data quality during a long, involved study, based on previous success during a longitudinal study with adults.35 In order to maximize engagement, we consulted with former middle-school educators during study design and development of materials. RAPCH community liaisons were local residents and people of color, characteristics that facilitated strong rapport with participants during daily data collection. We used illustrations to explain the study protocol. Perhaps we could have presented results more frequently, prompted recall of recent observations prior to diary completion, or increased the variation in daily activities to maintain engagement further.
Although we achieved data completeness, internal consistency, and external validity with our data, we also observed time-related decreases in measurement during follow-up that indicated study fatigue. Study fatigue occurred despite considerable efforts to maintain participant engagement in the protocol.
Investigators interested in longitudinal designs should carefully consider protocol details and consider the approach described to monitor reporting trends during follow-up and ensure high quality data.
Supplemental Table 1. Linear trends from logistic fixed effects models for change in symptom and odor reports during follow-up by data collection wave.
Supplemental Table 2. Linear trends from linear fixed effects models for change in lung function measurements during follow-up by data collection wavea
David Leith, Maryanne Boundy, and Karin Yeatts helped to design the study and contributed to field work. William H. Frederick, Lenon Hickman, Patricia Mason, Revenda Ross, Bryce Koukopoulos, Eileen Gregory, Steve Hutton, and Christopher Heaney provided essential study support. We are especially indebted to school staff members and participating students for their hard work during data collection. Preliminary results were presented at the Environmental Health Disparities and Environmental Justice Meeting at the National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, in July 2013, and at the 35th Annual Minority Health Conference entitled Innovative Approaches to Youth Health: Engaging Youth in Creating Healthy Communities at the University of North Carolina, Chapel Hill, North Carolina, in February 2014. Funding support for this research was provided by The Johns Hopkins Center for a Livable Future at the Bloomberg School of Public Health and the National Institute of Environmental Health Sciences Training Grant #2T32ES007018-36.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.