|Home | About | Journals | Submit | Contact Us | Français|
The Internet offers a promising channel to conduct smoking cessation research. Among the advantages of Internet research are the ability to access large numbers of participants who might not otherwise participate in a cessation trial, and the ability to conduct research efficiently and cost-effectively. To leverage the opportunity of the Internet in clinical research, it is necessary to establish that measures of known validity used in research trials are reliable when administered via the Internet. To date, no published studies examine the psychometric properties of measures administered via the Internet to assess smoking variables and psychosocial constructs related to cessation (e.g., stress, social support, quit methods). The purpose of the present study was to examine the reliability of measures of previous quit methods, perceived stress, depression, social support for cessation, smoking temptations, alcohol use, perceived health status, and income when administered via the Internet. Participants in the present study were enrolled in a randomized controlled trial of the efficacy of Internet smoking cessation. Following baseline telephone assessment and randomization into the parent trial, participants were recruited to the reliability substudy. An email was sent 2 days after the telephone assessment with a link to the Internet survey and instructions to complete the survey that day. Of the 297 individuals invited to participate, 213 completed the survey within 1 week. Results indicate that the internal consistency and test–retest reliability of the measures examined are comparable when self-administered via the Internet or when interviewer-administered via telephone.
With approximately 70% of Americans having Internet access, the Internet offers a promising channel to conduct health behavior change research (Rainie & Horrigan, 2005). Among the advantages of conducting research on the Internet are the ability to reach large numbers of people who might not otherwise participate in clinical trials and the ability to conduct research efficiently and cost-effectively. However, to leverage the opportunities for clinical research provided by the Internet, it is necessary to establish that measures known to be valid are reliable when administered via the Internet.
One concern about using the Internet for data collection in clinical trials is that data may be unreliable because researchers lack control over the environment in which data are collected. However, a growing body of evidence suggests that reliability and validity of data obtained using questionnaires administered via the Internet are generally consistent with results obtained through paper-and-pencil administered questionnaires (Davis, 1999; Gosling, Vazire, Srivastava, & John, 2004; Ritter, Lorig, Laurent, & Matthews, 2004). Cross-method consistencies have been demonstrated for numerous psychological and behavioral constructs including self-esteem (Robins, Trzeniewski, Tracy, Gosling, & Potter, 2002), self-trust (Buchanan & Smith, 1999; Foster, Campbell & Twenge, 2003), reaction time (McGraw, Tew, & Williams, 2000), depression (Lin et al., 2003), personality (Buchanan & Smith, 1999; Davis, 1999), and a variety of measures related to health status and health behaviors (Ritter et al., 2004).
Smoking cessation trials often include measures of co-occurring conditions and other factors related to cessation, including stress, depression, self-efficacy, social support, perceived health status, alcohol use, and previous quit methods. To date, there are no published studies comparing these measures when administered via the Internet and telephone. The present study examined the internal consistency and test–retest reliability of these measures in the context of a randomized controlled trial of the efficacy of Internet smoking cessation.
Participants were enrolled in a parent study that is an ongoing randomized controlled trial of the efficacy of an Internet smoking cessation program (www.quitnet.com) and telephone counseling. Recruitment leveraged QuitNet’s top-ranked position on major search engines. Smokers in the United States were recruited into the parent trial if they used the terms quit(ting) smoking or stop(ping) smoking in a search engine query on one of four major search engines (AOL, MSN, Yahoo!, Google), and had not previously used QuitNet (no cookie detected). When an Internet user meeting these criteria clicked on a link to QuitNet in the results of a search engine query, they were shown an intercept page inviting them to participate in the parent study (“Do you want to quit smoking? Would you consider joining a Federally funded research study to evaluate QuitNet? All quit smoking services will be free as part of the study.”). If they accepted, they were asked 10 eligibility screening questions (age, smoking rate, age of first puff, time to first cigarette after waking, number of quit attempts in past year, gender, race, education, zip code, and prior use of QuitNet website). Three questions determined preliminary eligibility (aged ≥18 years, ≥5 cigarettes/day, and no prior QuitNet use); remaining questions were used to characterize the largest possible denominator of potential study participants. Eligible participants were asked to provide online informed consent, their name, and contact information.
Within 48 hr, the participant was scheduled for the baseline telephone assessment, during which eligibility and informed consent were confirmed and a battery of assessments was administered via telephone interview. Participants were randomized to treatment, and then invited to participate in the Reliability Substudy. Subjects who consented to participate in the substudy were sent an e-mail 2 days after their telephone assessment that contained a link to the online survey. Each participant’s unique study identification number was embedded into the link to the online survey so that responses could be joined with their telephone survey data. Participants were paid US$15 for completing the online survey.
The baseline telephone assessment for the parent trial consisted of measures assessing demographic, smoking, and psychosocial characteristics. Brief measures with known psychometric properties were selected to minimize respondent burden. The online reliability survey consisted of the following subset of the measures administered during the baseline telephone assessment.
Participants were asked whether they had ever used the following methods to quit smoking: Cold turkey, pamphlet or book, individual counseling, group counseling, nicotine patch, nicotine gum, nicotine nasal spray, nicotine lozenge, nicotine inhaler, bupropion, switching to chew tobacco or snuff, an Internet program (not QuitNet), telephone counseling, acupuncture, hypnosis, or any other method not listed.
The 9-item version of the Smoking Temptations Questionnaire (Velicer, DiClemente, Rossi, & Prochaska, 1990) examined the degree to which various situations were tempting, a component of the construct of self-efficacy. Each item is rated on a 5-point scale ranging from 1=not at all tempting to 5=extremely tempting. The questionnaire can be scored to form a total score, as well as three subscales measuring temptations in positive affect or social situations, negative affect situations, and habitual or craving situations. Cronbach’s alpha reliability coefficients for the three subscales are .84 (positive/social), .92 (negative affect), and .82 (habit/craving).
The Perceived Stress Scale (PSS; Cohen, Kamarck, & Mermelstein, 1983) assesses the degree to which participants find their lives to be unpredictable and uncontrollable. Stress has been implicated in difficulty quitting smoking and relapse (Cohen et al., 1989). Each item is rated on a 5-point scale to indicate how frequently the individual has felt a particular way during the past month. Response options include never (0), almost never (1), sometimes (2), fairly often (3), and very often (4). A 4-item version has been validated, with Cronbach alpha reliability coefficients of .60 (Cohen & Williamson, 1988) to .72 (Cohen et al., 1983). Test–retest correlations range from .85 over 2 days in a college sample and .55 over 6 weeks in a smoking cessation sample (Cohen et al., 1983).
The 10-item Center for Epidemiological Studies—Depression (CES-D) Scale measures symptoms of current depression (Andresen, Malmgren, Carter, & Patrick, 1994). Scores on the CES-D have been positively associated with smoking prevalence and intensity, and failure to quit in representative samples of U.S. adults (Anda et al., 1990). Each item is rated on a 4-point scale to indicate the frequency of occurrence during the past week. Response options were modified to less than 1 day (0), 1–2 days (1), 3–4 days (2), and 5–7 days (3). Test–retest correlations range from .21 to .84, with an overall correlation of .71, at an average time interval of 22 days (Andresen et al., 1994).
The Partner Interaction Questionnaire (PIQ; Cohen & Lichtenstein, 1990) consists of 20 items that measure specific behaviors of a spouse relevant to smoking cessation. Support involving cooperative behaviors (e.g., expressing pleasure at the smoker’s efforts to quit) predicts successful quitting (Coppotelli & Orleans, 1985; Mermelstein, Lichtenstein, & McIntyre, 1983) and negative behaviors (e.g., nagging, criticizing) predict relapse (Cohen & Lichtenstein, 1990; Roski, Schmid, & Lando, 1996). The PIQ consists of 10 positive and 10 negative behaviors by the partner concerning the participant’s smoking. Participants rate how frequently their partner will behave in a particular way from never (0), almost never (l), sometimes (2), fairly often (3), and very often (4). The PIQ has been modified to measure the receipt of specific behaviors from the person who follows participants’ quitting most closely, not just a partner (McMahon & Jason, 2000; Gruder et al., 1993). We selected the three positive and three negative items that loaded most strongly in a factor analysis (personal communication, R. J. Mermelstein, November 9, 2004). The three positive items were express pleasure at your efforts to quit, congratulate you for your decision to quit smoking, and express confidence in your ability to quit/remain quit. The three negative items were mention being bothered by smoke, ask you to quit smoking, and criticize your smoking. Cronbach alpha coefficients were .92 for the three-item positive subscale and .84 for the three-item negative subscale.
Alcohol use is a common barrier to cessation (McClure, Wetter, de Moor, Cinciripini, & Gritz, 2002; Shiffman et al., 1996). Participants were asked if they drank alcohol at all. Those who said yes were asked to indicate how many days per week they drank alcohol on average, how many drinks they typically had, and the maximum number of drinks they had on one occasion during the past month. In addition, two questions asked, “In the last year, have you had more to drink than you meant to?” and, “In the last year, have you felt you wanted or needed to cut down on your drinking?” These items were adapted from a two-item screening measure to assess conjoint alcohol and drug use (Brown, Leonard, Saunders, & Papasouliotis, 2001). These items have high specificity (80%–90%) to detect current alcohol problems.
Participants were asked to rate their current health status on a 5-point scale from 1 (excellent) to 5 (poor). Participants were also asked if they had ever had smoking-related illness (yes/no). Total household income during the past year was assessed with eight response options (Table 3).
The survey questions were administered on nine separate screens. All questions were closed-ended, and needed to be completed in order to move to the next screen. Consistent with usability guidelines from the National Cancer Institute (usability.gov), a status bar (e.g., Page 3 of 9) appeared at the top of each page and the majority of questions appeared above the fold so that minimal scrolling was necessary. Multiple-choice questions appeared in a grid, with one question per row and one response per column; alternate questions were highlighted for ease of reading.
The first set of analyses documents the recruitment process and describes the recruited sample. To examine the generalizability of the final sample, we compared survey completers with noncompleters on a range of demographic, smoking, and psychosocial variables. Frequency tables are used to summarize the categorical data, and parametric and nonparametric tests are employed to determine statistical significance.
The comparison of the Internet- and phone-administered measures is based on assessing the interrater reliability for these two survey methods and the presence of any systematic bias, as manifested by mean differences for continuous variables and prevalence differences for categorical variables. Table 1 provides sample means and standard deviations for all continuous variables. To allow for the presence of outliers in the data, t-tests for the mean difference between Internet- and phone-administered measures were supplemented with more robust nonparametric tests of location based on the Wilcoxon statistic. Additionally, standardized mean differences allowed us to use post-hoc power calculations to distinguish clinically from merely statistically significant results. This study has at least 80% power at the 5% level of significance to detect an effect size of delta=.20 using N=219 subjects (the entire sample) and of delta=.23 using n=152 subjects (those known to have been alcohol users). According to Cohen (1988), these are small effect sizes, which—while likely to be statistically significant in our study—may have less practical importance than moderate effect sizes in the delta=.50 range.
In Table 1, we also examined the test–retest reliability of all continuous variables across survey methods using the intraclass correlation coefficient (ICC) between a single rating obtained via the Internet and a single rating of the same measure obtained over the phone (formula 3.1 of Shrout & Fleiss, 1979). Interrater reliability above 80% is usually sought in method comparisons, with 70% considered an acceptable value.
Table 2 provides the prevalence of all dichotomous variables. Prevalence differences between the paired binary indicators contributed by each subject are tested using McNemar’s test of marginal homogeneity. It is noteworthy that its power is driven entirely by the number of subjects with discordant reports (N_D), rather than the total sample size. Therefore, in Table 2 the number of subjects (N) contributing data on both surveys is supplemented by N_D, the actual sample size for each variable. Effect sizes for McNemar’s test have been defined by Cohen (1988) as small for g=0.05, moderate for g=0.15, and large for g=0.25, where g is the absolute difference from 50% in the proportion of discordant pairs that endorse the Internet over the phone-administered measure. Detectability with 80% power at the 5% significance level requires that the observed number of discordant reports exceeds N_D=140, 79, and 23, respectively.
Interrater agreement for binary variables is typically assessed using kappa coefficients (Cohen, 1960). Since their range is constrained by differences in prevalence between the dichotomous measures, caution should be exercised in their interpretation when the associated McNemar’s test is significant (Cook, 1998). In the absence of prevalence differences, standard cutoffs for measuring agreement using kappa coefficients have been established by Landis and Koch (1977), which rate them as follows: .80–1.00=Almost Perfect, .60–.80=Substantial, .40–.60=Moderate, .20–.40=Fair, .00–.20=Slight, and <.00=Poor. Confidence intervals were based on the profile variance method of Lee and Tu (1994).
Prevalence differences between ordinal variables in Table 3 were assessed using Generalized Estimating Equation models for multinomial data, with a working exchangeable correlation matrix, as implemented in PROC GENMOD of SAS 8.2 (SAS Institute Inc., 2001). Extensions of kappa-type statistics to ordinal data proposed by Cohen (1968) require weights for the cells corresponding to partial agreement. In Table 3, we used linearly decreasing weights of the form 1−(i−j)/(k−1), where i and j refer to the row and column scores and k is the number of response categories. Health status was rated on a 5-point scale, whereas household income was scored using the category midpoints for all categories other than the last one, for which a sensitivity analysis was conducted by varying the midpoint from US$125,000 to US$150,000. Confidence intervals can be calculated using the normal approximation of Fleiss, Cohen, & Everitt (1969).
Finally, in Table 4 the internal consistency of several continuous scales is examined using Cronbach’s alpha reliability coefficient (Cronbach, 1951) with 95% confidence intervals obtained according to van Zyl, Neudecker, and Nel (2000). Between-method comparisons were then performed by applying the methods outlined in Donner (1998) for comparing correlated alpha coefficients.
The study was run for 16 weeks between June 2005 and September 2005. During that time, 297 individuals were invited to participate. Of those, 288 accepted (97%) and 221 completed the survey for a final recruitment rate of 76.7%. Eight individuals completed the online survey more than 1 week after their telephone assessment and were excluded from analyses, resulting in a final sample of 213.
The majority of participants were female (65.3%), 79.8% were White, 11.7% Black, 4.2% Asian, 3.3% American Indian or Alaskan Native, and 2 individuals (<1%) were Native Hawaiian or other Pacific Islander. The average age of participants was 35.5 (SD=10.2; range 18–70). Almost half of participants had completed 1–3 years of college (49.3%), followed by 31.9% with ≥4 years of college, 16% with a high school degree or GED, and 2.8% with less than a high school degree. Analysis of family income showed 27.1% earning less than US$30,000 per year, 33.3% earning US$30,000 to US$50,000 per year, and 39.5% earning US$50,000 or more.
The majority of participants (92.5%) reported that they were planning to quit in the next 30 days. On average, participants smoked 19.3 cigarettes per day (SD=9.9, range 5–60), had their first puff of a cigarette at age 14.2 (SD=3.4, range 7–29), and became daily smokers at age 17 (SD=3.6; range 8–30). Participants had made an average of 2.9 quit attempts in the past year (SD=5.2; range 0–50) and reported higher levels of desire to quit (M=8.95, SD=1.54) than confidence in quitting (M=6.24, SD=2.29). Average score on the Fagerström Test for Nicotine Dependence (Heatherton, Kozlowski, Frecker, & Fagerström, 1991) was 5.03 (SD=2.37), with 44% of participants scoring 6 or above indicating a high level of nicotine dependence (Fagerström, Kunze, Schoberberger, et al., 1996). Analyses of body mass index indicated that 32.1% were overweight and 24.4% were obese according to standards of the Centers for Disease Control and Prevention (2004). Of those who reported current alcohol use (74.2%, n=152), 60.8% indicated they had more to drink than they meant to in the past year and 21.5% indicated they wanted or needed to cut down on their drinking.
The majority of participants had used the Internet for more than 5 years (79.8%), accessed the Internet several times a day (76.1%), and used a high-speed Internet connection (85.8%). Almost half (44.1%) of participants used the Internet to communicate with other people through services like an Internet blog, online bulletin board, or instant messaging.
Survey completers (n=213) were compared with noncompleters (n=67) to determine if there were group differences in demographic, smoking, and psychosocial characteristics. There were no differences on any of the variables tested, which included age, gender, race, education, employment, income, marital status, smoking rate, nicotine dependence score (Fagerström Test for Nicotine Dependence), desire or confidence in quitting, duration or frequency of Internet use.
Of the continuous variables examined in Table 1, the only variable showing differences of moderate effect size between the Internet- and phone administered version is the Negative Affect Situations subscale of Smoking Temptations, with the mean of the phone-administered measures 0.44 standard units higher than the mean of the Internet-administered version (p<.0001). Positive Affect also tends to be higher when reported over the phone, although the observed difference of 0.17 standard units is considerably smaller, albeit statistically significant (p=.0193). As a result, the Total Score of the Smoking Temptations scale shows an overall difference in the “small-to-moderate” range, with the two sample means 0.34 standard units apart (p<.0001). Other measures such as the CES-D show statistically significant differences even though the observed effect sizes are small, a result of the ample power our sample size affords for detecting within-subject differences in continuous outcomes. This lack of systematic bias between the two survey methods is accompanied by strong intraclass correlations, as depicted in Table 1.
Although the binary variables listed in Table 2 also show no significant differences between the two survey methods, the power of McNemar’s test in the present study is quite low for all but large effect sizes because of the small number of discordant pairs (N_D<23 throughout); the only exception is self-reported smoking-related illness, whose prevalence is considerably higher when assessed over the phone (59.91% vs. 49.53%, N_D=32, p<.0001). The ordinal variables listed in Table 3 (Income, Health Status) also show no significant differences in prevalence (p>.10).
The test–retest reliabilities for continuous variables measured in these two surveys are uniformly high (above 80%) for the Partner Interaction Questionnaire (PIQ) and the alcohol consumption measures, and moderately strong (in the 70%–80% range) for the Perceived Stress Scale (PSS) and CES-D. As seen in Table 1, the results are least satisfactory for the individual subscales of the Smoking Temptations Questionnaire scale, all of which fall below the 70% reliability threshold. Still, the overall scale (Total Score) is more reliable, as would be expected from a composite of three correlated subscales, with its ICC attaining exactly the 70% threshold.
Substantial test–retest agreement has also been obtained for the binary variables in Table 2, with kappa values above .70 for all but four variables assessing prior use of quit methods: Individual counseling, nicotine spray, Internet treatment, and telephone counseling. However, the precision of the reliability estimates is lower for binary than for continuous measures, and the 95% confidence intervals appear quite wide, allowing for the possibility that four additional variables show only moderate degrees of agreement between the two survey methods: Use of group counseling, nicotine inhaler, switching to chewing tobacco or snuff as methods to quit smoking, and report of ever having had a smoking-related illness.
In Table 3, we find evidence of substantial agreement for health status (weighted κ=.73) and almost perfect agreement for the income measure (weighted κ=.93).
Results for the income measure were not dependent on whether the midpoint of the highest income category used to construct the weights was changed from US$125,000 to US$150,000. Because of the informativeness of ordinal as opposed to binary measures, the confidence intervals are narrower which indicates improved precision in the estimates.
Finally, in Table 4 we evaluated the internal consistency of four scales and use them to compare the two survey methods. Cronbach’s alpha reliability coefficients exceed 80% for CES-D under both methods, are in the 70% to 80% range for the PIQ and PSS scales, and only fall below 70% for the phone-administered version of the Smoking Temptations scale. Between-method comparisons show no statistically significant differences for PIQ, PSS, and CES-D. Between-survey differences for the Smoking Temptations scale are borderline significant at the 5% level; in this case, as in all others, the Internet version appears to have higher internal consistency than the phone version.
Results from this study indicate that the internal consistency and test–retest reliability of Internet-administered measures of stress, depression, smoking temptations, quit methods, alcohol use, health status, and income are comparable with telephone-administered measures. These results add to a growing body of evidence supporting the use of the Internet to gather data in clinical trials. Given the often poor response rate to mailed surveys (Asch, Jedrziewski & Christakis, 1997), the high cost of telephone surveys, and the limited generalizability of self-selected volunteers who participate in clinical settings, the Internet offers a reliable and cost efficient means of obtaining data from participants enrolled in a clinical trial.
There were no systematic differences between survey completers and noncompleters in this study. As expected, the sample was highly motivated to quit smoking and had a smoking history profile (e.g., age of onset, desire to quit) that was representative of the smoking population who are proactively seeking treatment. Although it is common for females to enroll in a clinical trial at a higher rate than males, the Internet-based sample may have even more females and may be slightly younger than participants in typical clinical studies. There was a small but reasonable representation of minorities based on general Internet demographics, but as expected, the sample was primarily White and familiar with the Internet. Although results may be limited to this group, it is also reasonable to expect that this may be the group that would participate in Internet trials anyway. The moderate to high degree of nicotine dependence and presence of co-occurring conditions (stress, depression, alcohol use, overweight) seen in this sample may make cessation more difficult. Higher scores on the CES-D seen in the Internet version may indicate that participants are more comfortable reporting negative affect via the Internet than in a phone interview.
Reports of ever having had a smoking-related illness were more common in the telephone assessment than via the Internet. This is likely related to the wording of the question and the confusion of some participants regarding what constitutes a smoking-related illness. During the telephone assessment, research staff were able to clarify that this question included any health condition the participant believed was caused by their smoking, perhaps explaining the higher endorsement than in the Internet version. Income often is thought to be a sensitive question that participants might not feel comfortable answering; however, our findings indicate this is not the case and that participants provided the same information online as they did via telephone.
Test–retest reliability coefficients were lowest for the Smoking Temptations Scale, with ICC coefficients ranging from .66 to .68. This finding is likely related to the dynamic nature of cigarette craving, especially in a sample of smokers who had recently enrolled in a cessation trial (Velicer et al., 1990). Differences in internal consistency for this measure may also be related to difficulty understanding the question and/or the rating scale when administered on the telephone. We have found that this measure required repeated explanation during telephone administration. It may be that having the question and response options visually available during Internet administration facilitates response fidelity.
The study is limited by the necessity to administer brief forms of the measures to reduce response burden as well as by the self-selection biases inherent in any recruitment method. However, for clinical trials that will be conducted over the Internet, the results are encouraging. The sample appears to be quite representative of smokers who are interested in quitting. Future studies will need to replicate and extend the results reported here to other measures (including measures of cessation and verification of cessation in treatment outcomes) and to other recruitment methods (e.g., proactive vs. reactive recruitment; recruited via mail, telephone, print, radio or mass media). For cessation trials that examine individual differences and are interested in mechanisms of action (e.g., mediation/moderation models; the role of co-morbidity in treatment), the present study provides some of the first evidence that we know of to date that these measures are psychometrically sound and can be used in Internet-based smoking cessation research trials.
This research was supported by the National Cancer Institute (5R01CA104836-02). We acknowledge and thank Lan Jiang, M.S., Crystal Davis, MPH, and Pearl Zakroysky, B.A. for their assistance on this project.